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1. Introduction 


The goal of this paper is to understand the quality of outcomes of games and simple mechanisms 
in a dynamic environment. The Internet allows for the repeated strategic interaction of many enti¬ 
ties with constantly changing parameters and participants. Primary examples of such interactions 
include online advertising auction platforms, packet routing and allocation of cloud computing 
resources. Understanding whether the constant change in these strategic environments can severely 
damage the efficiency of the corresponding system, as compared to the hypothetical centralized 
optimal, is of prime importance as these systems constitute the corn erstone of the online economy. 
For example, advertising provides close to 90% of Google’s revenue (jGooglell2015h . 

Classical economic analysis of the interaction of strategic agents assumes that players reach a 
stable outcome where all players are mutually best-responding to each others’ actions (or consid¬ 
ers mechanisms that are dominant strategy solvable). Dynamic environments, with high volume 
interactions of small individual value or cost, such as packet routing or ad-auctions, are better 
modeled as repeated games with learning players. Nash equilibria of the one-shot game correspond 
to stable outcomes repeated in each iteration, where the players have no regret for their choice of 
strategies. Hence, analyzing the qualit y of outcomes in repeated games via the price of anarchy 
(iKoutsoupias and Papadimitrioulll999l ) assumes that the repeated game reaches a stable, station¬ 
ary outcome. Such an analysis of price of anarchy of one-shot Nash equilibria has received large 
attention in the past few years in both the computer science and operations research community 


and in a plet 

lora c 

Gorrea et al. 

2003) 


lora of application domains su ch as routing games ( Roughgarden and Tardos 


2003 ), bandwidth allocation (|Johari and Tsitsiklis 


firms (.Tohari and Tsitsiklis 


ters 17 to 21 of (INisan et al. 


2002 


200411 ■ strategic supply among 


201111 and online ad-auctions (jGaragiannis et al.ll201f)l ) (see e.g. Ghap- 


200711 for a survey). 


A more attractive model of player behavior in such repeated environments is to assume players 
use a form of algorithmic learning. Modeling players as learners is especially appealing in online 
auctions, as individual auctions provide very little value, costing only a few cents to a few dollars 
each, so using experimentation to learn from the data is natural. Many advertisers use sophisticated 


3 or AdRoll^. 


optimization tools or services to optimize their bidding, such as Blueka: 

It is well known that in most games natural ga me play does not lead to e quilibria, under any 
definition of “natural play” (see e.g. Ghapter 7 of (jHart and Mas-Golellll2012l) i. In fact, results on 
polynomial time computability of Nas h equilibria of ge neral games are mostly negative: finding 
equilibria is computationally hard (see (jPaskalakisI 120091 ) for a survey). 


^ http://www.oracle.com/us/corporate/acquisitions/bluekai/index.html, accessed 10-03-2015 
^ https://www.adroll. com/, accessed 10-03-2015 
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Even with computational concerns aside, the game that the participants are playing at each 
time-step and the participants they are playing against, can change at any time without even the 
players realizing it or being able to form any distributional belief. Hence, even the concept of a 
Nash equilibrium is debatable in such an adversarially evolving setting, as the players don’t even 
have the information necessary to calculate their expected utility at each time-step. Instead they 
observe their utility from the action they took or from any alternative action they could have taken, 
only after the fact. In such an evolving setting, players can base their actions on past experience. 

A particular class of learning behaviors, no-regret learning, emerged as a nice way to capture 
the intuition that players learn to play appropriate strategies over time without necessitating 
convergence to a stationary equilibrium. A stationary distribution that is also a no-regret learning 
outcome corresponds to a Nash equilibrium of the one shot game, and in this sense, learning 
outcomes generalize Nash equilibrium. More importantly, there are several simple and natura l 
algorithms that achieve the no -regret property (e .g. regret matching (jHart and Mas-Colelll 120001 ) . 
multiplicative weight updates (jArora et al.l 120121) 1. However, no-regret does not preclude the use 
of possibly much more sophisticated tools, including using the above learning algorithms with 
more complex benchmarks. Achieving small regret is a relatively simple expectation from bid 
o ptimization tools. 


Hlum et al 


(2006 


20081 ) consider regret-minimization as a model of player behavior in repeated 


games, and study the average inefficiency of the outcome, coining the term price of total anarchy 
for the worst-case ratio between the optimal objective value and the average objective value when 
players use a no-regret algorithm. In a sequence of play all players achieve the no-regret property, if 
and only if the empirical distribution of strategy vectors is a coarse correlated equilibrium, hence the 


price of total anarchy is the ratio of t 


correlated equilibrium. [Roughgarden 


l e soci ally optimal welfare to the welfare at the worst coarse 
(j2009l) observed that many of the Nash equilibrium price 


of anarchy bounds are shown via a proof technique he called smoothness, and such proofs easily 


extend also to show bounds on the quality of coarse correlated equilibria. ISvrgkanis and Tardos 


(j2013l ) extend smoothness to simple mechanisms, such as independent item auctions. 

However, this learning outcome analysis is based on the strong assumption that the underlying 
environment and player population is stable. The reason for this requirement is easy to understand: 
with the game and the players stable, there is a fixed optimal solution, and a fixed strategy, that 
each player i would need to play (action a*) as his or her part for achieving the optimum. To 
guarantee high social welfare via the smoothness approach, all we need is that each player i doesn’t 
regret not playing this optimal action a*. No-regret learning guarantees exactly this; player i will 
not regret any fixed strategy with hindsight, including strategy a*. However, online environments 
are typically not stable. 
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In this paper, we st udy learning outcomes in ga mes with a dynamically changing player pop¬ 
ulation. As stated in (iFudenberg and Levinelll998l. p. 4), the fact that players extrapolate across 
games they view as similar is an important reason learning has relevance in a real-world situation. 
A repeated game with an evolving population is exactly a setup where players are asked to play 
repeatedly in similar games. Rather than aiming to predict the exact outcome, our goal is to predict 
properties of outcomes, such as their efficiency, i.e. the price of anarchy. 

In a changing game environment, we need a slightly stronger notion of regret minimization. 
No-regret learning aims to select strategies that do at least as well on the average over a sequence 
of steps as the best single strategy would have done in hindsight. With the game environment and 
population changing, a single best strategy in hindsight gives a really weak benchmark. Players, 
using good learning algorithms, should be able to adapt to the changing environment, and such 
adaptation may be very useful with the population changing over time. For example, in the con¬ 
text of routing games, a player with many route options, may want to adjust their route choices 
depending which part of the network is more congested, or in auction games, a player may want 
to bid for items that are less in demand. 


Hazan and Seshadhril (|2007l ) formally introduced the stronger notion of adaptive regret that we 


will use, bounding the average regret over any sub-interval of steps [ti,T 2 ), compared to a single 
best action o'cer th i s inte rval in hindsight. The study of adaptive learning goes back much further: 


the work of 


Lehrei (2003) and 


Blum and Mansouii (j2007ll studied generalizations of adaptive regret 


prior to (jHazan and Seshadhrill2007l ) . Clearly short intervals will result in relatively high regret with 


any learning algorithm, but adaptive learning algorithms guarantee for the player that the cumu¬ 
lative regret grows sub-linearly with the length of the interval. Most adaptive learning algorithms 
are constructed by modifying classical no-regret learning algorithms to stop relying too heavily 
on experience from the distant past. We believe that such adaptive learning is a better model of 
behavior when strategic agents (such as bidders in online auctions) use sophisticated optimization 
tools. The current best adaptive lear ning algorithm is a natur al adaptation of the classical Hedge 
algorithm, AdaNormalHedge, due to iLuo and Schaoirel (j2015ll . With this framework in mind, we 
ask the following main question: 

How much rate of change can a system admit to sustain approximate efficieney, when its 
participants are adaptive learners? 

Our Results. We show that in large classes of games, if players choose their strategies in a 
way that guarantees low adaptive regret, this ensures high social welfare, even under surprisingly 
high turnover. To model a changing environment we consider a dynamic player population where 
between every pair of iterations each player leaves independently with a (small) probability p and 
is replaced by an arbitrary new player, implying that in expectation a p fraction of the population 
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is replaced. The independent departure probability models churn in player population caused by 
effects that are external to the game. We make no assumptions on the sequence of arriving players, 
which can be chosen in an adversarial way. We use independence of departures for simplicity of 
presentation, and most of our results carry over to any process where the departing players are 
also chosen adversarially, subject to a constraint on the number of per-step replacements. This 
model of the environment is simple enough to allow a clean analysis, and allows arbitrary worst 
case shifts in player populations. 

We show that learning behavior ensures high social welfare in dynamic situations with high 
churn for four classes of games: 

• In Section 14.11 we consider an item auction game with unit demand bidders. At each period 
the auction sells m different items, and the bidders have value for at most one item per-period. 
The value of player i for each item j is different and is denoted by . We consider a simple auction 
format: each item is auctioned independently (via a first or second price auction). We show that 
adaptive learning by players ensures high social welfare (i.e. price of anarchy close to 4), even when 
the probability p of player departure is close to a constant (independent of the number of items or 
players, and depends only on the range of values that players have). 

• In Section [521 we consider a bandwidth allocation: a unit of bandwidth is to be divided across 
the players of the game and each pla yer i has a v aluation fun ction Vi(x) for bandwidth x . We 
consider the proportional mechanism of iKellvl (jl997l l. analyzed in iJohari and TsitsiklisI (l201lf ). and 
show a price of anarchy close to 4 under mild assumptions on the utility functions and even with 
high player turnover. 

• In Section 15.21 we prove that in large dynamic congestion games learning by players ensures 
low social cost even with a dynamically changing player population. For example, when the costs 
are a linear function of the congestion, we get a price of anarchy guarantee close to the 5/2 price 
of anarchy of the corresponding one-shot atomic congestion game, even if a l/polylog(n) fraction 
of the n players are changing at each time-step. 

• In Section 15.31 we consider auction games where bidders have gross substitute valuations. 
Extending the results of Section [4.11 we prove that in large dynamic markets, learning by players 
ensures high social welfare, i.e. price of anarchy close to 2, even if a l/polylog(n) fraction of the n 
players are changing at each time-step. 

We achieve these results by developing a general technique (in Section [3]) to show that in many 
games adaptive learners achieve high social welfare in dynamically changing environments. Our 
technique is based on the following three conditions: 

1. All players are adaptive learners, i.e. they choose their strategies in a way that guarantees 
small adaptive regret on the outcome (for instance, using an adaptive learning algorithm). In 
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deriving concrete bounds, we assurne tha t pla yers use adaptive learning algorithm with the best 
known bound of ihuo and Schapir'e 2015 1 or ( Blum and Mansoui 2007 1. Our results deteriorate 
gracefully with weaker assumptions on the regret of learning. 

2. The game repeated in each state (called stage) needs to hav e low price of anarch y. In partic¬ 
ular, we need that the game satisfies a slight stren gthening of the (lRoughgardenll2009ri smoothness 
property (or the smooth mechanisms property of ( Svrgkanis and TardoJ 2013 11. which is typically 
used to prove price of anarchy guarantees. 

3. There exists a sequence of solutions for the underlying optimization problem that is approxi¬ 
mately optimal, and where on average each player’s part of the solution is stable, i.e. doesn’t change 
much over time. 

With our model of players leaving the game independently with probability p at each step, on 
average each player is expected to participate in 1/p rounds of the game, which turns out to be long 
enough to learn good strategies. On the other hand, players will experience dynamic population 
changes, and with no assumption on arriving players, they will need to adapt to the changing 
environment. With a player population of size n, and each player being replaced with turnover 
probability p, after each step we have np new players in expectation, so the population is constantly 
changing. We use an approximately optimal solution where each player’s allocation is relatively 
stable as a benchmark for each player; a stable enough benchmark that will allow adaptive learners 
to learn how to play at least as well as this solution. We will be interested in understanding what 
value of p is needed to guarantee high social welfare. 

To apply the above outline to a game, we need to develop techniques for point 3 above: show 
that there exists a stable sequence of close to optimal solutions in our changing environment. We 
present two ways to achieve this stability. In Section [H we consider solution sequences that are 
produced by greedy algorithms where a turnover in the input has only local influence in the output. 
In Section [5l we consider solution sequences that are produced by differentially private algorithms 
where a turnover in the input affects the whole output but only with a small probability. 

Our first application, via the greedy algorithm approach, is the unit demand auction problem 
analyzed in Section r4.ll In a unit-demand auction, after a change in one player, we could recompute 
the optimal solution by an augmenting path algorithm. Unfortunately, a single augmenting path 
can change the assignment of many (or even all) players, and hence in no sense is the evolving 
optimal solution stable. Such major changes can happen even if the player valuations are all 0 or 1. 
We develop a greedy algorithm that finds stable solution sequences losing only a factor of 2 from 
the optimum value. To illustrate the idea, observe that in the special case of 0/1 values a greedy 
matching is essentially stable, and has size at least 1/2 of the optimal matching. In Section l4.ll 
we extend this idea beyond 0/1 valuations and give stable solution sequences to the unit demand 
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auction problem. We use this algorithm to show that players using adaptive learning guarantee high 
social welfare in the item auction game with unit demand bidder even with a dynamically changing 
player population, allowing for a probability p of player departure that depends logarithmically on 
the range of values players have, and does not depend on the number of items or players. 

Another application of the greedy algorithm approach is the bandwidth allocation problem 
(Section 14.21) . where some bandwidth is divided across players with smooth concave valuation 
functions. Segmenting the bandwidth in small parts and viewing each segment as an item, we 
provide an almost optimal greedy approximation algorithm with similar stability guarantees as in 
the unit-demand auction setting. 

In Section [5] we develop a general method for a pplying our framew ork via the use of differential 


privacy. Differential privacy has been developed by 


Dworketah 


(j2006l ) for (approximately) answer¬ 


ing queries of databases of private information, while protecting the privacy of data. Consider a 
database of sensitive personal information (such as medical data). The framework of differential 
privacy has been developed to allow us to take advantage of the statistical information in the 
database without compromising the privacy of the individuals. A differentially private response 
to a database query is randomized, and it requires that if two databases differ only in the data 
related to one individual, the probability that the response differs is very small. In recent years 


many optimizatio n problems have been s 


the recent book of 


Dwork and Rotl] (|2014M . 


lown to be solvable in a differentially private way (see 


The requirement of differential privacy for a solution to an optimization problem is very close to 
what we need for our stable solution sequences: if there is a differentially private close to optimal 
solution, this immediately implies that the solution cannot change much as one person’s data 
changes. We will be using a variant of th e notion of different ial privacy adapted to game theoretic 
environments, joint differential privacy (IKearns et al.M2014l ). Player Ps share of any reasonable 
solution must depend on his/her own input, so a solution cannot be fully differentially private. 


Jo int differentia. 
of Kannan et al. 


privac y fixes this discrepancy. In fact, the notion of marginal differential privacy 
(j2014l ) seems even more appropriate, as it only requires that the output for each 


player j is differentially private in the data of other players. In order to take advantage of differential 
privacy in the context of dynamically changing games, we need to overcome an important technical 
difficulty: with the output of the differentially private algorithm randomized, the natural measure 
of change in a sequence of such outputs is the sum of the total variation distances between adjacent 
pairs of distributions. We need to turn the sequence of output distributions with low total variation 
distance into a distribution of stable output sequences. We do this in Section[5]for joint differential 
privacy. In Appendix IEC.2I we show how to adapt our analysis to the weaker notion of marginal 
differential privacy. 
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We illustrate the differential p rivacy approa c h via two applications. In Section 15.21 we use the 
differentially private algorithm of lRogers et al.l (1201, 5^ for congestion games to prove that in large 


dynamic congestion games players using adaptive learning guarantees low social cost even with a 


dynamica 


Hsu et al. 


l y cha nging player population. In Section [5.31 we use differentially private algorithms of 
()2014l l for a matchings and allocations with gross substitute valuations to prove that 


in large dynamic markets players using adaptive learning guarantees high social welfare even with 
a dynamically changing player population. For simplicity of presentation, we focus on first price 
auctions in Sections IQ and 15.31 but our results apply also to second price auction (assuming no 
overbidding) as well as any hybrids of the two auction formats. In this setting we show, roughly, 
that if we have a smoothness-based price of anarchy bound for the single-shot game then, in the 
dynamic population setting, the price of anarchy is e close to the same bound assuming that 
p = 0(e®/polylog(n)), as long as the market is large enough, in the sense that the supply of goods 
is large enough. The simultaneous first price auction gives a price of anarchy bound of 2. Thus 
even if approximately n/log(n) players are changing at each time-step, a constant inefficiency is 
guaranteed. 

As a benchmark for the latter two results, it is interesting to consider a simpler model of dynamic 
player population, where the departure or arrival of a player is announced to all players. We expect 
np new players each step, so in expectation there will be l/{np) steps with no change at all. If 
all the changes are announced, players could be expected to restart their learning algorithms due 
to the change. If the stable period l/{np) is long enough, we can use results for the total price of 
anarchy to guarantee high social welfare. Under standard no-regret learning algorithms each player 
will then have average regret approximately 0{y/n -p). Hence, if we want the regret in the system 
to be at most an e fraction of the optimal welfare and hence contribute only an e to the inefficiency, 
we would require that p = 0{pjn). In other words, the probability that any player changes in a 
period needs to be jn, which is a tiny rate of change for large n. 

Our results are stronger than what is implied by this argument in two ways. First, we do not 
assume that change is announced, rather, we take advantage of the fact that players using learning 
algorithms can adjust to the changing environment even without the announcement of the change. 
More importantly, our results allow a probability of change much higher than the required by the 
above argument. The resulting dynamic game will not have long periods with no change. Multiple 
players will be arriving and leaving at each step. We show that in many games, despite the constant 
change, there exists a good benchmark of the kind mentioned in the conditions above, where each 
player’s individual solution or allocation is relatively stable. The rate of expected change np in 
our applications will turn out to be high, especially as the number of players increases. Roughly 
speaking, if we want the regret of the players to be an e fraction of the optimal welfare, we will only 










9 


require that p = 0 (poly(e)/polylog(n)), where the constants depend on several parameters of each 
game at hand, but importantly depends only logarithmically in the number of players. Moreover, 
in some games we even give a bound that is independent of n. Hence, for any constant e we allow 
almost a constant fraction of players to be changing at each period. 

Further related work on dynamic games. Dynamic games have a long histo ry in economics 


dynam ical systems and oper ations research, see for example the survey books of 


Jl998 l 


Baar and Olsder 


and 


Van Long ([20101). The classic approach of analyzing behavior in such dynamic games 


is to assume that players have prior beliefs about their competitors and that their behavior will 
co nstitute a perfect Bay e sian e quilibrium or refinements of it su ch as the sequentia l equili brium 


of 


Fudenberg and Tirold ([19911) or Markov perfect equilibrium of iMaskin and Tirold ([20011 1. The 


intractability of such equilibrium solution concepts and the informational and rationality assump¬ 
tions that they impose on the players casts doubt on whether players in practice and in complex 
game theoretic environments such as packet routing or internet ad-auctions would behave as pre¬ 
scribed by such an equilibrium behavior. 

it be more plausible in large game approximations (see e.g. recent 


Equil ibrium-like behavior mig 
work of iKalai and Shmaval ([20151 11. A natural approximation to equilibrium behavior in large game 


(and particularh 

h in auction settings) is t 

rat of the mean field equilibrium ( 

Balseiro et al 

2OI5I, 

Weintraub et al. 

2006, 

2008, 

Adlakha et al. 

2OI5I, 

Iver et al. 

2014, 

Adlakha and Johari 

2013) 

How- 


ever, even these large game approximations require the players to form almost correct beliefs about 
the competition and exactly best-respond to these approximate large-market beliefs. Moreover, the 
approach requires that the environment either is stochastically stable or evolves in a known stochas¬ 
tic manner and in most situations the mean held approach captures behavior at a stochastically 
stable state of the system. On the contrary our dynamic model allows for adversarial changes and 
our analysis attempts to analyze even constantly evolving and never converging behavior. More¬ 
over, our assumption that players invoke adaptive learning algorithms does not impose that players 
possess or form any beliefs on the competition. Most of the algorithms that achieve adaptive regret 
only require that the player is able to see the utility that each of his strategic options would have 
given in-retrospect, in past time-steps. Last our approach also applies in small markets. 

There is also a large literature on truthful mechanisms in a dynamic setting analogous to our 
dynamic player population model, where the goal is to truthfully implement a desired outcome 
wi th dynamically changing populations of users with private value. This line of work goes back 


to 


Parkes and SinghI (|2003[ 1 in t he computer science literature, but h as been also consider ed much 

( 2010 1 offers a 


Cavallo et al, 


earlier with queuing models bv iDolanI (|l978l l. In a more recent work 
generalized VCG mechanism in an environment very similar to the one we are considering with 
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departures and arrivals, and also provides a nice overview of work in truthful mechanisms in a 
dynamic se tting. For a more complete overview, the reader is referred to the survey on dynamic 
auctions by iBergemann and SaidI (j201Ci) . 

Further related work on learning in games. Ther e is a 


games, dating back to the work on fi ctitious play by 


the reader is referred to the books of 


arge li terature analyzing learning in 
Brown (19511. For an overview of this area, 


Fudenberg and Levind (jl998h and ICesa-Bianchi and Lugosi 


(|2006f ). A standard notion of learning in games is that of no-regret learn ing. The notio n of no¬ 
regret against the best fixed action in hindsight dates back to the work of iHannanl (119571 ). and is 
also referred as, Hannan con sistency. There are many lea rning algorithms achieving this guarantee 
such as regret matching by iHart and Mas-Colelll ()20001) and multiplicative weights updates by 
Freund and Schaoird (jl997l) . 

Related work on learning in dynamic environments. The notio n of no-regret learning against 
time-varying benchmarks, as opposed to fixed actions, traces back to lHerbster and WarmuthI (jl998l ) 
who provided guarantees compared to the best sequence of k experts. The st ronger notion of adap 


Hazan and Seshadhri 


five re gret, i.e. having guarantees for every sub-interval was formalized by 
(12007) and near- o ptimal adaptive regr e t guar a ntees were achieved throu gh a s eries of algorithms 

( 20121) . and ILuo and Schaoire 


by 


I^hrerl ( 2003 ). iBlum and Mansourl ( 2007 ). 


Cesa-Bianchi et al 


([20151). One important trait of these algorithms is that they display some sort of recency bias, in 
th e sense that the influence of past st eps decays as time goes by. Recent experimental evidence 
bv iFudenberg and Pevsakhovichl ( 20141 ) suggests that humans display such forms of recency bias 
when making repeated decisions. 

Competing against an adaptive benchmark has also been studied in the context of online convex 


optimization. 


Besbesetal 


(|2013[ ) compare to a target function that is changing from step to step. 


In order to guarantee some stability across steps they require that the total variation distance 
between subsequent target functions is bounded by some number. This is a way to capture the 
notion that subsequent rounds are not very different, related to our notion of turnover probability 
which in expectation guarantees a similar stability bound on the number of changes per step. 


2. Preliminaries 

Games and mechanisms. We will consider a game played repeatedly, where the population of 
players is drifting over time. Let G be an n-player normal form stage game and assume that game 
G is played repeatedly T times. Each player i who participates in a stage game has a strategy space 
Si, with maXi 15^1 = N, a type Vi G Vi and a cost function Ci{s;Vi) that depends on the strategy 
profile s G 'XiSi, and on his type. We will denote with (^(s; v) = XlzeH social cost, where 

s is a strategy profile and v a type profile. We will also analyze the case when the stage game 
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is a utility maximization mechanism M, which takes as input a strategy profile and outputs an 
allocation Xi{s) for each player and a payment Pi{s). We will assume that players have quasi-linear 
utility Ut{s;Vi) = Vi{Xi{s)) — Pi{s) and the welfare is the sum of valuations (sum of utilities of 
bidders and revenue of auctioneer): W(s; v) = 

In all the games that we study, the optimal social welfare problem can equivalently be defined as 
an optimization over a “feasible solution space” Af" which involves no incentives (e.g. in network 
congestion games it is the set of feasible integral flows, in a combinatorial auction setting it is the 
set of feasible partitions of items to bidders, in the bandwidth allocation setting it is the set of 
valid partitions of the bandwidth). We will overload the social cost and welfare notations, and for 
a feasible solution (or allocation) x e X”- we will use C(x; v) and W(x; v) to denote the social cost 
or welfare of the solution. We denote the optimal social cost or welfare for a type profile v, as 
Opt(v) = minxGAr'i C{-k; v) and Opt(v) = maXxG^'" W (x; v) respectively. 

Definition 2.1 (Repeated game/mechanism with dynamic population). A repeated 
game with dynamic population consists of a stage game G played for T time steps. Let P* denote 
the set of players at time t, where each player i ^ P^ has a private type After each step, every 
player independently exits the game with a (small) probability p > 0 and is replaced by a new 
player with an arbitrary type. The utility of a player is additive across steps. We denote this 
repeated game with T = (G,T,p). Similarly, we denote with Ai = {M,T,p) a mechanism that is 
played T times with player replacement probability p. 

Our model of dynamic population assumes that after each step every player independently exits 
the game with a probability p> 0, so each player is expected to play the game for 1/p rounds. To 
keep our model simple, we make the assumption that when a player exits, she is replaced by a new 
participant. This assumption guarantees that we will have exactly n players in each iteration, with 
a p fraction of the population changing each iteration in expectation. We make no assumption 
about the types of the new arriving players which can be picked adversarially. Most of our results 
could be extended to the case when the players that are being replaced is also chosen adversarially, 
subject to some constraint on the number of per-step replacements. 

To simplify the notation, we will use player i to denote the current fth player, where this player is 
replaced by a new ith player with probability p each round. An alternate view of the dynamic player 
population is to think of players as changing types after each iteration with a small probability p. 
We will refer to such a change as player i switches or turns over. 

Basic notation. For any quantity x we will denote with the sequence x^,..., x^. For instance, 
will denote the sequence of types of player i produced by the random choice of leaving players 
and by the choices of the adversary. 

We will consider three special classes of games, two welfare-maximization mechanisms and one 
cost-minimization game: 
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First-price Auction Game. The auction games we consider are defined by a set of m goods, 
where we will assume that each good has a supply of s identical copies in each iteration. We assume 
for simplicity of presentation that the supply of each item is identical. The players are buyers who 
repeatedly participate in item auctions to buy copies of the items. Each buyer wants at most one 
copy of each item. The type of a buyer i is her valuation over sets of items. 

We will use vl{A) to denote the valuation of the i-th player in iteration t, if he gets at least one 
copy of each item in set A C [m]. We will assume, that valuations are non-negative and at most 
1. Last we will assume that conditional on having a set S, the marginal value of a player for any 
extra item j, i.e. U S') — u-(S) is either 0 or at least some constant p. Valuations over time 

are additive, which models perishable items, such as advertising opportunity, where a player will 
play to repeatedly win items in each period she is participating. 

We will focus the presentation on first price item auctions, where players submit a bid on each 
item separately: if we have s copies of an item, the s highest bidders for the item get one copy each, 
and pay their bid (ties are broken arbitrarily). The bid on each item comes from some sufficiently 
fine, discrete bid space. Specifically, bids are multiples oi 6- p for some small 5 and lie in [0,1]. Our 
results also extend to second price auctions, as well as hybrid auctions. 

In our first application (in Section [4.ip we will consider unit demand buyers and the supply of 
each item is arbitrary. Thus a buyer’s value for any set of items in one iteration is their value for 
the best single item in the set they acquired. For the case of unit demand, we use v\{j) to denote 
the value of an item j for buyer i at time t, so the player’s value for a set A is u-(A) = max^g^ 

In this application, we will assume that players will bid for at most one item at each iteration. 
Thus the number of strategies available to each player is N = ^. 

In Section [53] we consider large markets of first price item auctions with players that have more 
complex valuations satisfying the gross substitute property. In this application, we will assume that 
players want at most d types of different items, i.e. for any set A: vl{A) = ma,XTcA-.\T\=d'vl{T) and 
we assume that they will bid on only d different auctions. Thus the number of strategies available 
to them is N = (™) 


S‘p j — \ 5-jO 


Proportional band width a llocati on mechanism. The proportional bandwid th allocation mecha¬ 
nism. introduced by iKellvl (Il997l ) and first studied for price of anarchy by jjohari and Tsitsiklis 


([20111), is defined by a bandwidth of B and a valuation function for each player which is concave 
on the bandwidth she receives. At every round, each player i submits a bid b\, pays her bid and 
gets allocated bandwidth proportional to her bid, i.e. Xi{F) = 




J- 


0 j 


In this setting the type of the player is her valuation function. We will use vl{xi) to denote 
player z’s valuation for bandwidth Xi. We will make the assumption for the valuation functions 
that their slope will be lower bounded by some p > 0. Player i’s utility will be again quasilinear, i.e. 
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ul{b) = (xi(6‘)) — 6-. Similarly as before, we will assume that bids will be only multiples of p5 for 

some (5 > 0. Therefore bidding space is sufficiently discrete and the number of strategies available 
is at most N = 

pS 

Atomic congestion game. In the atomic congestion game, we assume that we have a set of 
congestible elements E (and let m = |i?|), each element e has a latency function ie{x), or cost, that 
is monotone non-decreasing in the congestion x. Given some selection of sets Si EE for each player 
i, the congestion in an element e is the number of players that have selected it: Xe{s) = |{i: e G Sj}|, 
and the cost of player i is then the sum XleGs' ^eixe{s)). 

A player’s type u* denotes the possible subsets of the element set she can select. For example, in 
the routing game on a graph, the type of a player i is a source-sink pair (oj,dj), and her strategy 
is the choice of a path from Oi to di in the graph. We assume that a player’s cost is inhnity if her 
solution is not one of the selected sets. Thus the number of strategies available to the player is 
the number of {oi,di), paths in the graph and thereby N is the maximum number of such possible 
paths across possible source-sink pairs. 

Adaptive Learning in Dynam ic Environments. We use the notion of adaptive regret introduced 
bv iHazan and Seshadhril (j200i1) . We start by dehning no-regret learning, and then consider adaptive 
regret. To formally define regret, and no-regret learning, we consider an arbitrary loss function. For 
a cost-game, we will think of the cost the player incurs as loss. For a utility game, we dehne loss 
each step as the difference between the maximum possible utility and the player’s utility. Consider 
a player who has N possible choices, the N strategies that the player has to choose from. In dehning 
regret, and no-regret learning we are focusing on a single player, and hence we will temporarily 
drop the index i for the player from the notation. We use L{s,t) to denote the loss (cost or lost 
utility) of the player if she plays strategy s at time t. We can assume without loss of generality 
that L{s,t) is a value in [0,1], but make no assumption beyond this about the sequence of loss 
values. We say that a player achieves no-regret, if she does at least as well over a period of time, 
as the best choice s* with hindsight. Formally, we say the regret of a strategy sequence s^'^ is 


L{s*,t) — mmL{s* ,t) 

' ^ S* 

Note that even with a stable set of players the value L{s,t) will vary over time, depending of the 
strategies chosen by other players. As ment ioned in the Introduction, there are many simple algo¬ 
rithms (see, for example, ( Arora et al.ll2012 1i that achieve regret 0{y/T) against any (adversarial) 
sequence of loss values L{s,t). 

In dealing with changing environments, we will need a stronger assumption on the learning of the 
players, we need that the players adapt their strategies to the environment. We will use a notion of 
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adaptive regret, regr et over (long) interva. 


sequence, defined bv iHazan and Seshadhril (j2007li 


s of tim e [ti,T 2 ), in addition to the regret of the whole 


Definition 2.2 (Adaptive Regret). The adaptive regret of strategy sequence in time 
frame [ri,r 2 ) is defined as: 

1 - 2-1 


R{ti,T 2 ) = max t) — L{s*,t)) 

a* / ^ 


t^Tl 


Adaptive learning algorithms go back to the work of lLehren ((20031) and iBlum and Mansoud ((20071) 
who considered more general notions of regret. We say that a player satisfies adaptive learning if 
her regret R{ti,T 2 ) can be bounded by a function that is o{t 2 — d), that is, regret grows slower 
than linearly over time. Our results are affected by the quality of the learning algorithm players 


use, as with better learning we can tolerate higher turnover in the p opulation of player s . In t 


re rest 


of the paper we will use the learning bounds of the recent work of iLuo and Schaoirel (120151 1. who 


developed an adaptation of the classical Hedge algorithm, AdaNormalHedge that achieves small 


regret on all interva 

s. An alternate algorithm with a bound of the same type was also given in 

(Blum and Mansour 

2007 

)• 


Theorem 2.1 (( Luo and Schapirell2015f D. Suppose a player uses AdaNormalHedge and 
selected strategy sequence For any time frame [ri,r 2 ), AdaNormalHedge achieves adaptive 
regret: 

E{R{ti,T 2)) < CR\/{T2-Ti)ln{NT2) 


where N is the number of choices, Cr is a small constant less than 2, and loss is assumed to be in 
[0,1] for all s and t. 

In what follows, we will assume that all players in our repeated game use a learning algorithm 
with low adaptive regret, will use Ri{ti,T 2 ) to denote the adaptive regret of player i over the period 
[ri,r 2 ]. For simplicity of presentation, we will assume that, for some constant Cr, E(i?(ri,r 2 )) < 
Cr\/{t2 — ri)ln(A^r 2 ) for all players and all time periods [ri,r 2 ). Throughout the paper, we will 
refer to this assumption as “The players use adaptive learning algorithms with constant CrT Our 
results would smoothly degrade if we assumed only that players achieve adaptive regret that is 
some other sublinear concave function of the interval’s length {t 2 — Ti). 

Solution-ba s ed Sm oothness in Games and Mechanisms. Smooth games were introduced by 


RoughgardenI ((20091 ) as a general framework bounding the price of anarchy in games. He also 
showed that smoothness based price of anarchy bounds extend to outcomes in repeated games 
when all players use no-regret learning. 
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We need a somewhat more general variant of smooth games, that compares the cost or utility 
resulting from a strategy choice to the social welfare of a specific solution, rather than comparing 
to the social optimum. For two strategy vectors s and s* we use (s*, s_i) to denote the vector where 
player i uses strategy s* and all other players j use their strategy Sj. 

Definition 2.3 (Solution-based smooth game). A cost-minimization game G is (A,/i)- 
smooth with respect to a solution x, if for some A > 0 and /r < 1, for any type profile v, for each 
player i there is a strategy s* G Si depending on his type Vi and her part of the solution Xi such 
that for any strategy profile s 


'^c,{s*{vi,x,),s_,]Vi) < AC(x; v) +/rC'(s; v) 


A game G is solution-based (A,/r)-smooth if it is smooth with respect to any feasible solution 
xG A". 


Note that, when x is the optimal solution, we recover the traditional examples of smooth games, as 
the deviating strategy s* usually depends on other players’ types through his part of the optimal 
solution x*(v). A gam e that is (A, M)-smoo th with respect to the optimal solution x*(v) is (A,/i)- 
smooth in the sense of (|Roughgardenll2009l l . and the game has price of anarchy bounded by A/(l — 
/i), and the average social cost of no-regret learning outcomes is also bounded by A/(l — /r)OPT. 
More generally. 


Theorem 2.2. If a game is {X, fi)-smooth with respect to a solution x, then at any Nash equilibria 
of the game, as well as at any no-regret learning outcome, the expected cost is at most y^C'(x;v). 

Proof of Theorem \2JA We include the proof for the case of pure Nash equilibria for complete¬ 
ness. Consider a strategy vector s that is a Nash equilibrium. At a Nash equilibrium, no player has 
regret for any alternate strategy, so in particular we get that Ci{s*{vi,Xi), S-i]Vi) > Ci{s;Vi) for all 
i. Adding up these inequalities and using the smoothness property, we get 


C(s; v) = ^ Ci{s] Vi) < ^ Ci{s*{vi, Xi),s_i, Vi) < AC'(x; v) -h ^C(s; v) 


( 2 . 1 ) 


iG n 


The claimed bound follows by rearranging the terms. The proof extends to randomized equilibria 
b y taking expectations, i ncludi ng the distribution resulting in no-regret learning in the limit. □ 


Svrgkanis and TardosI (|2013f) give a related definition for smooth mechanisms assuming quasi- 


linear valuation for all players. Again, we define a mechanism smooth with respect to a solution x, 
and allow the choice of strategy s* to depend on the player’s part of the solution Xi and his type 
Vi- More formally, we will use the following definition. 
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Definition 2.4 (Solution-based smooth mechanism). A mechanism M is (A,;u)-smooth 
with respect to a solution x for some A, ^ > 0 if for any valuation profile v for each player i there 
exists a deviating strategy s* G Si depending on Vi and Xi such that for all strategy vectors s, 

'^Ui{s*{vi,Xi),s_p,v^) > AlD(x; v) - piTZ{s). 

i 

where 7^(s) = Af is a solution-based (A,/r)-smooth mechanism if the latter holds for 

any feasible solution x G A". 


Syrgkanis and TardosI (j2013[ l proved that a (A,^)-smooth mechanism has price of anarchy 


bounded by max(/i, 1)/A, and the average social welfare of no-regret learning outcome is also at 
least (A/max(^, 1))Opt(v). Analogously we get: 

Theorem 2.3. If a mechanism is p)-smooth with respect to a solution x, then at any Nash 
equilibria of the game, as well as at any no-regret learning outcome, the expected social welfare is 
at least VT(x; v). 

Differential privacy. Differential privacy has been developed for databases storing private infor¬ 
mation for a population. A database D G V" is a vector of inputs, one for each player. Two databases 
are i-neighbors if they differ just in the f-th coordinate, i.e. differ only in the input the z-th player. 
If two databases are z-neighbors for some z, they are called neighboring databases. 

In the context of repeated games, every time a player leaves or arrives, the solution may change 
drastically. Instead of comparing the game outcomes to the socially optimal solution that changes 
with every player change, we will want to compare the outcome to a more stable but close to 
o ptimal solution. The notion of differential privacy offers a useful framework for this goal. 


Dworketal 


fenOffl define an algorithm as differentially private if one person’s information has 


little influence on the outcome. In the setting of a game or mechanism the outcome for player z 
clearly should depend on player z’s input (her claimed valuation, or source destination pair), so 
ca nnot be differentially private. The notion of joint differential privacy which has been developed 
by iKearns et al.l (12014^ to adapt differential privacy to settings, where the algorithm has a set of 


n outcomes, one for each player. We use X to denote the set of possible outcomes for one player, 
so an algorithm in this context is a function Al: V" —^ A". The algorithm is jointly differentially 
private, if for all players i, the output for all other players is differentially private in the input of 
player i. More formally, 

Definition 2.5 (( Kearns et al. 2014)). An algorithm M : V" —)• A” is (e,(I)- jointly differen¬ 
tially private if for every z, for every pair of z-neighbors D, D' G V”, and for every subset of outputs 
SC 

Pr[A{D)_, G 5] < exp(e)Pr[M(D')-. G 5] + <5 
If h = 0, we say that A is e-jointly differentially private. 
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We will see that close to optimal and jointly private solutions along with smoothness with respect 
to the sequence of solutions x*, can be used to show the strength of learning outcomes in our setting. 
Over the last the years there have been a number of algorithms develo ped that solve pr o blems 


close to optimally in a differentially private way. See the recent book of 


Dwork and Roth 


J2m4 1 


for a survey. In this paper, we will take advan tage of such algorithms, including t he algorithms 


for so lving matching problems flHsu et al.ll2014l ) and finding socially optimal routing (iRogers et al 


201511 . 


Marginal privacy. A recent work of 


Kannan et al 


(j2014[ l introduced the weaker notion of 


marginal differential privacy, also in the setting when the algorithm outputs a set of n outcomes, 
one for each player. A mechanism is marginally differentially private if the distribution of outcomes 
for any one player j is differentially private in the input of another player if^j, but not requiring 
that the combined output of all players j i should be differentially private in ith input. Our 
main results continue to hold even under this weaker notion of privacy. However since no improved 
approximation algorithms are known under this notion for the settings that we study, we focus on 
joint privacy in the main part of the paper and present the extension in Appendix IEC.2I 


3. Price of Anarchy for Dynamic Games and Mechanisms 

In this section we offer our two main theorems which follow the high level outline presented 
in section [TJ Specifically, we formalize the connection between adaptive learning, solution-based 
smoothness and the existence of approximately optimal and stable solution sequences. We give this 
connection both in the context of cost-minimization games and in the context of mechanisms. In 
the next section we give an application of the framework to unit-demand matching markets and 
bandwidth allocation, and in Section [5] we provide a more canonical approach towards producing 
stable sequences by connecting the problem to differential privacy, along with a way we can relax 
the stability notion required. 

Definition 3.1 (A:-stable sequence). A randomized sequence of solutions = {x^,... ,x^} 
and types = {v^, • ■ •, v^} is A:-stable if the average (across players) expected number of changes 
in each individual player’s solution or type is at most k, i.e., if ki{vf^,xl'^) is the number of times 
that xl / xl^^ or ?;■ 7 ^ then: 

1 " 

Theorem 3.1 (Main theorem for cost-minimization games). Consider a repeated cost 
game with dynamic population T = {G,T,p), such that the stage game G is solution-based 
{X, jj,)- smooth and costs are bounded in [0,1]. Suppose that and x^’^ is a k-stable sequence, 
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such that X* is feasible (pointwise) and a-approximately (in-expectation) optimal for each t, i.e. 
E[C'(x*;v‘)] < a • E[Opt(v‘)] . If players use an adaptive learning algorithm with constant Cr then: 

J]E[C(s‘; v‘)] < ^ ^E[Opt(v‘)] + • Ca• (A: + 1) • In(iVr) 

t t ^ ^ 

An analogue of the theorem above holds for mechanisms too. 

Theorem 3.2 (Main theorem for mechanisms). Consider a repeated mechanism with 
dynamic population A4 = (M,T,p), such that the stage mechanism M is solution-based (A,//)- 
smooth and utilities are bounded in [0,1]. Suppose that and x^-^ is a k-stable sequence, 
such that X* is feasible (pointwise) and a-approximately optimal (in-expectation) for each t, i.e. 
a • E[VT(x*; V*)] >E[Opt(v*)]. If players use an adaptive learning algorithm with constant Cn 
then: 

J]e[1T(s*; V*)] >-^e[Opt(v*)] - n • ■ {k + 1)-IniNT) 

^ Qi indiX'^ J. ^ ^ j ^ 

We also show an improved bound for some classes of mechanisms that satisfy an non-negative 
utility property and which we will use in our application in Sect ion [4. II For the case of simultaneous 
single-item first price auctions with unit-demand bidders it leverages the fact that by bidding only 
on one item at-a-time, player utilities are guaranteed to be nonnegative at all times, and only a 
subset of the players (e.g. at most m in the case of an m item auction) are being allocated in any 
feasible allocation. Under these conditions, players with no item in the feasible allocation will have 
no regret against a deviating strategy that attempts to ’’win” the empty allocation. For a general 
mechanism M the required Property is stated as follows: 

Property 1. M has an empty allocation 0 in the allocation space. Moreover ni(s*(uj,0),s_i) =0 
and Ui{s-,Vi) > 0 for any strategy that is used by the players. 

Theorem 3.3 (Improved bound for mechanisms). Consider a repeated mechanism with 
dynamic population A4 = (M,T,p), such that the stage mechanism M is solution-based (A,/r)- 
smooth, satisfies Property (HD and utilities are in [0,1]. Assume that there exists a randomized 
sequence of solutions x^’^ = {x^,...,x^} and types = {v\...,v^}, such that x‘ is feasi¬ 
ble (pointwise) and a-approximately optimal (in-expectation) for each t, i.e. a • E[lT(x‘; v‘)] > 
E[Opt(v‘)]. 

For each player i, let be the number of times that x\ 7 ^ or (x\ / 0 and v] 7 ^ 

If the randomized sequence satisfies an analogue of k-stability: 

n 

^Y.^[n.{vr,xn]<k (3.1) 

^ i=l 

^ Observe that unlike the definition of xj'^) does not account for changes in the type of players 

that are not currently allocated an item in solution xf 
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and players use an adaptive learning algorithm with constant Cr then: 

A 




a max{ 1, /x} 


^E[Opt(v‘)] - Cr^JT ■m-{k-n + m)- In(iVr) 


where m is such that for any feasible allocation x, |{z: x* / 0}| < m. 


Removing the dependence on T. In all the theorems of this section there is a logarithmic depen¬ 
dence of the average regret on the time horizon T. This will lead in the efficiency theorems through¬ 
out the paper to require that the probability of change p be at most a quantity that is inversely 
proportional to log(T). As we want to think of T as a really large quantity, one might argue that 
this dependence makes the requirements on p very harsh. However, we note that this dependence 
on T is not essential and is only for the simplicity of exposition. The quantity that should actually 
go into the regret bounds presented in this section is rather of the order of the expected lifespan 
of any player in the repeated game, which is of the order of 1/p. Therefore the log(T) terms in the 
theorems of this section can be replaced by terms that are roughly 0(log(l/p)). 

In Section [EC.4l of the supplementary material we formalize this argument and provide a detailed 
proof of how to remove the dependence on T in all our theorems. 


4. Stable Sequences via Greedy Algorithms 

In this section we offer direct arguments to show the existence of stable solution sequences and 
hence good efficiency results for games with dynamic population. We prove efficiency results for 
the case of matching markets with dynamic population and the case of proportional bandwidth 
allocation with dynamic population. Our method is based on a combination of using the greedy 
algorithm and rounding the input parameters. 


4.1. Matching markets 

As a first application we focus on a repeated mechanism with dynamic population T = (M, T, p ), 
where the stage mechanism is simultaneous first price auction with unit-demand bidders (matching 
markets). To apply our improved theorem, Theorem 13.31 we need two things: i) that the mech¬ 
anism is allocation based (A,/r)-smooth, and ii) that there exists a relatively stable sequence of 
approximately optimal solutions for the optimization problem. 

We start by showing that the mechanism is smooth. (1/2, l)-smoothness of the simultaneous first 
price auction with submodular bidders fa super -set of unit-demand valuations) and continuous 
bids was known by ISvrgkanis and Tardoa (120131). We con sider discrete bidding spaces. A simple 


modification of the result of 


Svrgkanis and TardosI ()20131 1 shows that if the discretization is fine 


enough, then the mechanism is approximately (1/2, l)-solution based smooth. We will present the 
more general result for submodular valuations, as we will re-use this fact in Section [5.31 where we 
consider more general valuations. 
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Lemma 4.1 (Smoothness of simultaneous first price auction). The simultaneous first price 
mechanism where players are restricted to bid on at most d items and on each item submit a 
bid that is a multiple of 5 ■ p, is a solution based — 5, l)-smooth mechanism, when players have 
submodular valuations, such that all marginals are either 0 or at least p and such that each player 
wants at most d items, i.e. vfiS) = maxT<zs:\T\=dv{T). 

To get a stable and approximately optimal allocation, we use a layered version of the greedy 
algorithm. The greedy matching algorithm considers item valuations Vi{j) in decreasing order and 
assigns item j to player i if, when Vi{j) is considered, neither item j nor player i are matched. 
To make this algorithm more stable we define the greedy-layered matching algorithm, which works 
as follows. Let p > 0 be the smallest non-zero value that a player has for any item. For a positive 
e < 1/3, we round each player’s value down to the closest number of the form p(l -|- efi for some 
integer I, and run the greedy algorithm with these rounded values. It is well known that the greedy 
algorithm guarantees a solution that is within a factor of 2 to optimal. We lose an additional factor 
of (1 -|- e) by working with the rounded values. The greedy algorithm will have many ties and we 
will resolve ties in a way to make the output stable. 


Lemma 4.2 (Stability via the greedy algorithm). Consider a repeated matching market 
mechanism with dynamic population A4 = (M,T,p}, with m items and n players, where p is the 
minimum possible non-zero valuation. Assuming T> 1/p, the greedy-layered matching algorithm 
with parameter e guarantees that lF(x*; v*) > 2 {i+i) Qpt(v*) for all t, and it can be implemented so 
that the average (over players) expected number of changes in the allocation sequence or the type 
for players who hold an item at the time of the change is upper bounded by 


: EL. E h )] < 


(4.1) 


Theorem 4.1 (Main theorem for matching markets). In the simultaneous first price auc¬ 
tion mechanism with dynamic population and unit-demand bidders, if all players use adaptive 
learning algorithms with constant Cr and ifT>l we have: 

Zt V*)] > ^ E* E[Opt(v‘)] - mT • C«y^6-p-log(i+,)(l/p)-ln(Arr) (4.2) 

where N is the number of different strategies considered by a player. 

If in addition we assume that all items get allocated at each round for the minimum value of p, 
or that the average optimal welfare in each round is at least mp, that is E^i E[Opt(v‘)] >mp, 
then we can also get a purely multiplicative bound: 


E,E[1F(s‘;v‘)]> 


4(l+3e) 


EtE[OPT(v* 


(4.3) 


2 2 

if the turnover probability p is at most C ■ /or (7 = (96(1-|-e)^(C'fl;)^ log(^_i_g)(l/p))“^. 
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Remark 4.1. An interesting feature of Theorem 14.11 is that the probability p is independent of 
the number of players n and the number of items m, implying that the game can accommodate 
extremely high turnover in player population, as the number of players increases, without losing 
in the quality of the outcome. The probability p required for the high quality solution, needs to 
depend only on log(^^^)(l/p), logA^ and logT, where N is bounded by ^ and the dependence on 
T can be removed as presented in Section [EC.41 of the supplementary material. 

The high-level intuition why the greedy algorithm can sustain such a rate of change is as follows: 

At any time-step the only players that incur any non-zero regret are the players to whom the 

greedy solution currently allocates some item. Since the optimal welfare is at least m ■ p, if we 

want the efficiency to be e close to what is implied by having absolutely no regret for the greedy 

layered algorithm we need the total regret in the system to be at most e ■ m ■ p. In other words, 

we need the regret associated with each item to be at most e • p. Now observe that when an item 

is allocated to a player in the highest level, i.e. with a value in 1]; then this player is never 

unassigned from that item until he leaves the game. Thus we can roughlj^ view the lifetime of 

an item as decomposing into p ■ T cycles such that during each cycle the item transitions from 

level-1 players to level-log(;^^j)(l//9) players. In other words, the lifetime of an item splits in roughly 

pTlog(;^^^)(l//>) stable allocation intervals, leading to average interval length (plog(j_|_g)(l/p))“^ 

and thereby average regret at most ../p Tog(^^^)(l//o). Since we want this regret to be at most e-p, 

2 2 ... _ 

we get p < log^ ^ ')(i/p) essentially the bound we have in Theorem 14.11 


4.2. Bandwidth allocation 

As a second application we focus on a repeated mechanism with dynamic population Ai = (M, T,p) 
where the stage mechanism is the proportional bandwidth allocation mechanism. Recall the band¬ 
width sharing mechanism, where every player i submits and pays a bid bi, and the available 
bandwidth B (which we assume is R = 1 for notational simplicity) is divided proportionally to the 
player’s bid, so bidder i gets bandwidth Xi{b) = and pays bi. We assume that the player’s 
utility is quasilinear, so if the player’s valuation function is Vi(x) for x a mount of bandwidth 


.lohari and Tsitsiklis 


then t he resulting utility is Ui{b) = Vi{xi{b)) — bi. Following iKellvl (|l997l) and 
(|201lf ). we will assume that the player’s valuation functions Vi : [0,R] —)-]R are increasing, concave 
and differentiable. Further, we will make some Lipschitz style assumptions on the rate of change 
of the value functions. Concretely, we will assume the following: 

1. Value functions Vi{x) are increasing, concave and twice differentiable, Ui(0) = 0 and Vi{B) < 1. 

2. The rate of increase is at least p, i.e. Vi,a:: v[{x) > p. 


Not completely accurate as players can leave to other items too, but a good approximation. 
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3. The gradient is a-Lipschitz, i.e. \/i,x : |r'-'(a;)| < a. 

Following a similar approach as in the previous section, we can derive an efficiency guarantee in 
this setting too. 

Theorem 4.2 (Main theorem for bandwidth allocation). Consider the proportional hand- 
width sharing game with dynamic population and with valuations satisfying the conditions listed 
above. If all players use adaptive learning algorithms with constant Cr and ifT>I then we have: 

E, EHr(s‘; v‘)l > E. E[Opt(v‘)1 

4 4 

if the turnover probability p is at most C ■ ^ 2 i^Int) P'’" ^ ~ “ ^^)^(C'rj)^log(i+j)(a(l — 

e)/p^e))-\ 

The high-level outline of the proof consists of three lemmas. 

• As a benchmark optimization problem, we consider the 5-segmented bandwidth allocation prob¬ 
lem for some <5 > 0, where all allocated bandwidths are integer multiples of 5. We show that the 
Lipschitz condition above ensures that for a small enough <5 > 0, the segmented optimum is not 
much smaller than the true optimum. 

Lemma 4.3. The social welfare of the optimal 5-segmented solution approximates within (1 — e) 
the optimum if S <^. 

• To get a stable and approximately optimal allocation for the ^-segmented bandwidth problem, 
we use a layered version of the greedy algorithm, similar to our greedy matching algorithm in 
Section im We divide the bandwidth in segments of length 6. The greedy bandwidth allocation 
algorithm greedily allocates segments based on the marginal increase in the players’ valuation 
function. We will denote as Vij the marginal valuation that player i has for her j-th. segment. Note 
that, due to concavity of the valuation function Vij is a non-increasing function on j and, due to 
the lower bound on the gradient, it is at least pS. The greedy algorithm is therefore optimal for 
the h-segmented bandwidth problem. 

To make it more stable, similarly as in the matching markets, we use a layered version of the 
valuation functions where the layer of some marginal valuation Vij is the highest i such that 
^ pS{l-\-eY~^. We will use C{j) to denote the layer that the j-th most valued (as in marginal 
values) segment was assigned at step t. We will again select the tie-breaking rule across marginal 
values of the same layer to facilitate stability, i.e. previous holders of segments are helped in the 
tie-breaks to keep the same number of segments as they had before. We show that the greedy layered 
algorithm for the (^-segmented bandwidth allocation problem finds a solution within a (1 + e) factor 
of the welfare of the optimal h-segmented solution, and that the sequence of solutions found by 
this greedy algorithm is stable. 
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Lemma 4.4. Consider a repeated 5-segmented bandwidth allocation game with dynamic population 
A4 = (AI,T,p) and n players. Assuming T > 1/p, the greedy layered algorithm with parameter e 
guarantees that VL(x‘;v*) > ■^y^Opt(v‘) for all t, and it can be implemented so that the average 
(over players) expected number of changes in the allocation sequence or the type for players who 
hold an item at the time of the change is upper bounded by 

i E"., E [«. {vr, xr)\ < 


Finally, we need to show that the proportional sharing mechanism is smooth. 


Svrgkanis and TardosI (j2013[l showed that the mechanism is (2 — \/3 — e, l)-smooth using a random¬ 


ized deviation. To use this deviation in our framework, we want to consider a discretized bidding 
space. We show that for every e > 0, the proportional allocation mechanism is (2 — \/3 — e, l)-solution 
based smooth with respect to any solution of the (5-segmented bandwidth allocation problem, using 
a the discredited deviation. 


Lemma 4.5. The proportional mechanism allowing only bids that are multiples of (/= ed is (2 — 
y/S — e,l)-solution based smooth with respect to any 5-segmented allocation. 


Combining these lemmas, we use Theorem 1,3.31 to get the claimed efficiency result. 

Proof of Theorem \4.^ From Lemmas 14.3114.4114.51 and Theorem 13.31 setting 5 = , we get 

that the aggregate social welfare of the proportional allocation bandwidth is of the 

optimum. This is achieved for turnover probability p: 

<_ [P^ _ 

^ - 6 • 16(1 + eY{CnY log(i+,)(l/p(5) In(iVr)' 

Replacing (5 , the result follows. □ 


5. Stable Sequences via Differential Privacy 

In this section we formally connect joint differential privacy with the construction of stable 
sequences needed by our main Theorems 13.11 and 13.21 In Appendix IEC.2I we offer a strengthening 
of these theorems that allows us to use marginal differential privacy. Differential privacy offers a 
general framework to find solutions that are close to optimal, yet more stable to changes in the 
input than the optimum itself. To guarantee privacy, the output of the algorithm is required to 
depend only minimally on any player’s input. This is exactly what we need in our framework. 

Theorem 5.1 (Stable sequences via privacy). Suppose there exists an algorithm ,4. : V" —)• 
A(T") that is {e,5)-jointly differentially private, takes as input a valuation profile v and outputs a 
distribution of solutions such that a sample from this distribution is feasible with probability 1 — f3, 
and is a-approximately efficient in expectation (for 0 < e <1/2, a> 1 and 5,/3 > 0). 
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Consider a sample from the distribution of valuations produced by the adversary in a repeated 
cost-minimization game with dynamic population F = {G,p,T). There exists a randomized sequence 
of solutions for the sequence such that for each 1 < t < T, x* conditional on v* is 

an a-approximation to Opt(v*) in expectation and the joint randomized sequence (v^-^,x^-^) is 
pT{l-\-n{2e-\-2fi + 8))-stable (as in Definition \3. 1\) . 

We defer the proof of Theorem 15. II to the next subsection. Combining Theorem IS.ll with Theorem 
O and Theorem 13.21 we immediately get the following corollary. 

Corollary 5.1. Consider a repeated cost game with dynamic population T = {G,T,p), such that 
the stage game G is allocation based {X, fa)-smooth and T > L Assume that there exists an {e,5)- 
joint differentially private algorithm .A : V" —)> T" with error parameter (3 that satisfies the con¬ 
ditions of Theorem 15.11 If all players use adaptive learning algorithms with constant Gr in the 
repeated game then the overall cost of the solution is at most: 

^*^[^(8*; v‘)] < Et Opt(v*) + ^ • C'fi^2p(l + n(e + /3 + 5))ln(iVT) 


Similarly for a mechanism we get: 


5.1. Proof of Theorem 15.11 


nT 

max{ 1, //} 


Gii^j2p{l + n(e + /? + 5)) In(lVr) 


We will use total variation distance to measure the distance between distributions. For two distri¬ 
butions p and r] on some finite probability space the following are two equivalent versions of the 
total variation distance: 


dtvip,ri) = ^||p-7?||i = inax(p(A) -7y(A)), (5.1) 

2 AcCl 

where in the 1-norm in the middle we think of p and ry as a vector of probabilities over the possible 
outcomes. 


Lemma 5.1. Suppose that A. : V” — )• A(T’") is an (e,S)-joint differentially private algorithm with 
failure probability fi (for 0 < e < 1/2 and S,/3 > 0) that takes as input a valuation profile v and 
outputs a distribution over feasible solutions a. Let a and a' be the algorithm’s outputs on two 
inputs V and v' that differ only in coordinate i. Then we can bound the total variation distance 
between and a'_^ by dtffa^i^aff) < {2e-\-5). 

Proof of Lemma \5.1\ Condition (12.51) of joint differential privacy guarantees that if we let S C 
be a subset of possible solutions for players other than i and with and a'_^{S) the 

probability that the two distributions assign on S, then for any S: iT_i(5) < exp{€)a'_^{S) 5. 
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Since e < 1/2, we can use the bound exp{e) < 1 + 2e to get that iT_i(S') — < 2ea'_^{S) + (5 < 

2e + 5. Thus by the second dehnition of the total variation distance in Equation (15.11) we get that 
<2e + S. □ 

To facilitate the proof we need a simple lemma from basic probability theory. 

Lemma 5.2 (Coupling Lemma). Let p, and ry be two probability measures over a finite set 0. 
There is a coupling uj of {p,r]), such that if the random variable {X,Y) is distributed according to 
uj, then the marginal distribution on X is p, the marginal distribution on Y is p, and 


Pr[X^Y] = dtfip,v), 


Proof of Theorem \5.1[ Suppose that ^ : V" —)• A(T’") is an (e, (5)-joint differentially private algo¬ 
rithm as described in the definition of the theorem. The differentially private algorithm fails with 
probability fi. We will denote with a the output distribution over solutions for an input v, where 
we use the optimal solution in the low probability event that the algorithm fails. (Equivalently A 
could be a randomized algorithm and cr its implicit distribution over solutions). 

Let be the sequence of distributions output by the private algorithm when run on 

a deterministic sequence of valuation profiles v^,...,v^ with the modification described in the 
paragraph above. To simplify the discussion we will assume that only one player changes valuation 
at each time-step t. Essentially we are breaking every transition from time-step f to t -|- 1 into 
many sequential transitions where only one player changes at every time step, and then deleting 
the solutions from the resulting sequence that correspond to the added steps. Thus the number of 
steps within this proof should be thought as being equal to n-p-T in expectation. 

By differential privacy we know that the total variation distance of two consecutive distributions 
without the modification of replacing failures with the optimal solution is at most 2e -|- 5. Since, 
by the union bound, the probability that any of the two consecutive runs of the algorithm fail is 
at most 2/3, we can show that the total variation distance of the latter modified output is at most 
2e + 8 + 2(5, i.e. for any tG [T]: , Y_f) <2e + 8 + 2(5 (see Lemma (5?^ for a formal proof). 

We can turn the sequence of distributions ..., into a distribution of sequences of allocations 
by coupling the randomness used to select the solutions in different distributions a*. To do 
this, we take advantage of the coupling lemma from probability theory, Lemma 15.21 If at step t no 
player changes values, then it* = and we select the same outcome from the two distributions. 


so we get = 0. 

Now consider a stm in which a player i changes her private type Vi. We use Lemma [5.2l to couple 
x(+^ and xi so that 3 


tx*_V x*_J = d,, (ai+i, ai J < 26 + 5 + 2/3. 


(5.2) 


® One can think of it as sampling conditional on x* and assuming the joint distribution of x‘ and is as 
prescribed by the coupling lemma applied to and This is to address concerns that x* is already coupled with 
x*“^ in the previous step. 
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Note that this couples the ith coordinate and x- in an arbitrary manner, which is fine, as we 
assumed that the valuation of player i changes at this step. 

We have dehned a probability distribution of sequences for every hxed sequence of valuations 
v^'^. We extend this definition to random sequences of valuation in the natural way adding the 
distribution of valuations v^'^. 

We claim that the resulting random sequences of (valuation,solution) pairs satisfies the statement 
of the theorem: the a-approximation follows by the guarantees of the private algorithm and by 
the fact that we use the optimal solution when the algorithm fails. Next we argue about the 
stability of the sequence. Consider a player i, and the distribution of her sequence 
In each step t her valuation u* changes with probability p contributing pT in expectation to the 
number of changes. In a step t when some other value j ^ i changes, we use ()5.2p to bound the 
probability that x* ^ by 2e + 5 + 2/3. Thus any change in the value of some other player j 
contributes (2e + 2(3 + 5) to the expectation of the number of changes for player i. The expected 
number of such changes in other values is {n — \)pT over the sequence, showing that the sequence 
is pT +{n — l)pT{2e + 2(3 + 5) < pT{l + n(2e + 2(3 + 5)) stable, as claimed. □ 

Lemma 5.3. Let q and q' he the output of an {e,5)-joint differentially private algorithm with failure 
probability (3, on two valuation profiles v and V that differ only in coordinate i. Let a and a' be 
the modified output where the outcome is replaced with optimal outcome when the algorithm fails. 
Then: 

divifr, c') + 2e + 5 + 2(3 

Proof of Lemma \5.3[ Consider two random coupled random variables y,y' that are implied by 
Lemma [52] applied to distributions q and q', such that y ^ q and y' ~ q' and Pr [y / y'] = dtv{q, q') < 
2e + d (by (e,(5)-joint privacy). Now consider two other random variables x and x' where x = y 
except for the cases where y is an outcome of a failure in which case x is equal to the welfare optimal 
outcome and similarly for x' and y'. Obviously: x ~ a and x' ^ a', thus (x, x') is a valid coupling for 
distributions a and a'. Thus if we show that Pr[x 7 ^ x'j < 2e + <5 + 2(3, then by properties of total 
variation distance dtvicr, a') < Pr[x x']<2e + 5 + 2(3, which is the property we want to show. 

Let fail be the event that either y or y' is the outcome of a failed run of the algorithm. Then by 
the union bound Pr [fail] < 2(3. Thus we have: 

Pr [x 7 ^ x'j = Pr [x 7 ^ x' I -ifail] • Pr [-ifail] + Pr [x' 7 ^ x | fail] Pr [fail] 

< Pr [x 7 ^ x' 1 -ifail] • Pr [-ifail] + 2(3 

= Pr [y 7 ^ y' j -ifail] • Pr [-ifail] + 2(3 

< Pr [y 7 ^ y'\ + 2(3 < dtv{q, q') + 2(3 <2e + 5 + 2(3 

This completes the proof of the Lemma. □ 
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5.2. Large Congestion Games with Dynamic Population 

Our first application of diffe rential privacy is fo r the atomic congestion game with dynamic popu¬ 
lation, defined in Section [2J [Rogers et al.l ([20151) gives a jointly differentially private algorithm for 
finding an optimal solution in congestion games, called Private gradient descent algorithm. They 
focus on routing games due to the paper’s focus on tolls as mediators, but their algorithm works 
in full generality for any atomic congestion game. 

We illustrate our technique with linear latencies ie{x) = OeX + bg. W e assume latency is m onotone 
increasing, i.e., Oe > 0 for all e G i? and that be > 0. The algorithm of iRogers et al.l ( 20151 1 assumes 
that ie{x) < 1 for all e. To achieve this we need to scale latencies by nmaxe(ae + be). This makes 
the functions y-Lipschitz for 7 = 1/n. For this case, the algorithm outputs an integer solution that 
satisfies (e, <5) joint differential privacy, and has an error probability of /3 for parameters €,S,(5 > 0 , 
and for player types v with probability 1 — /3 returns a solution x with cost in expectation over the 
randomization of the algorithm 

rrp t ri'y^ t 

E[C'(x;v)] <Opt(v)H - ^ -polylog(e, l/J, 1//3, n, m). (5.3) 

We can combine this differentially private algorithm with Corollary 15.11 for a class of latency 
functions £{x) that we have good smoothness properties. The class of linear latencies iJx) = a^x + b. 


are ( 5/3. 1/31-smooth (jChristodoulou and Koutsoupiasll2005l. lAwerbuch et al. 


2013 . 


Roughgarden 


20091 ). The same proof also gives: 


Lemma 5.4. Congestion games with linear latencies (.e{x) = OeX -|- bg for Oe, be>0 are (5/3,1/3)- 
smooth with respect to any solution x. 

Theorem 5.2 (Main theorem for large congestion games.). Consider a repeated conges¬ 
tion game with dynamic population T = {G,T,p), such that T >^, the stage game G is an atomic 
congestion game with affine latency functions ie{x) = OgX -f bg with Og > 0 and be>0 for all e. 
For any rj > 0, if all players use adaptive learning algorithms with constant Gr, then the overall 
expected cost is bounded by 

E,E[C'(s‘;v‘)]<|(l + 7y)E*OPT(v‘) 


assuming the probability p of departures is at most G ■ r]'^ ■ m 

2 / . s 4 


-10 


(InT) ^ for 


G = 


12-141 


•(C'. 


R) 


^-2 


mmg Og 


(log^(m • n) ln(n)) 


-1 


^maxe(ae -|- bg) ^ 

Remark 5.1. We note that the probability p depends mainly on the number of congestible ele¬ 
ments m, but depends on n only in a polylogarithmic way. For large n, almost a constant fraction 
of the player population can turn over at each step. 

In Appendix lEC.31 we generalize the bound to polynomial functions, and also give additive error 
results for congestion games with general latency functions. 
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5.3. Large Markets with Dynamic Population 

Next we revisit the first price auction game, but consider a much broader class of valuations: we 

J2ni4ll 


consider large markets with valuations that satisfy the gross substitute property. 


Hsu et al. 


give a jointly differentially private algorithm to find close to optimal allocation in markets where 
buyers have the gross substitute property, and there are enough copies of each item. This algorithm 
will allow us to derive good welfare guarantees for outcomes on adaptive learning in repeated 
auctions with dynamic population using Corollary 15.II 

We will assume that the valuation functions satisfy the gross substitute property, i.e., increasing 
prices outside a subset doesn’t decrease the player’s demand in the set. 

Definition 5.1 (Gross-substitute valuation). For a price p let p{A) = Yjj^aPj denote the 
total price, and let oj{p) denote the player’s most desirable set of goods, that is, let uj{p) = 
argmax^u(H) —p{A). The valuation satisfies the gross substitutes condition if for every pair of 
price vectors {p,p') such that V items j Pj < p' and for every set of goods S G w(p) if 5' C 5 satisfies 
p' =Pj for every j G S' then there is a set S* Guj{p') with S' S*. 

We will make the following large market assumptions: 

1. The number of items ms is large, in particular ms > cn for some constant c < 1. 

2. In the optimal solution each item can be assigned for at least p marginal gain. This implies 
immediately that the optimal social welfare is at least Opt* > pms at each time tG [T]. 

3. The players are interested in at most d types of items and want only one copy of each item 
(meaning that their value for any bundle A of items is equal to the maximum value among any 
subset of this bundle with cardinality at most d). 


We will use the PAlloc algorithm from 


Hsu et al 


(|2ni4 l as our benchmark for adaptive learning. 


The algorithm has two additional parameters a > 0 and /3 > 0, it is e-jointly differentially private, 
that is (e, 0)-jointly differentially private and with probability (1 —/3) it computes a feasible efficient 
allocation. Assuming the supply s is high enough, the social value of the allocation is at least 
Opt — a ■ max(ms, n) in expectation, where recall that ms is the total supply, as we have s copies 
of m different items each. Concretely, with supply s we get 

1 


a = 0 


•polylog(n,m,s,l//3) 


(5.4) 


(se)i/3 

In order to be able to use this algorithm as a benchmark in Corollary 15.11 we need to show that 
this is an approximation algorithm with small approximation factor. 


Lemma 5.5. For every r]> 1, when the players’ valuations satisfy the gross substitute assumption, 
the algorithm PAlloc with privacy parameter e(n) can be used to output an allocation, w.p. 1 — 
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/3(n); that has social welfare at least (1 — |)Opt under the large market assumptions listed above, 
assuming in addition that 

V = o( - , ^ ) •polylog(re,m,s,l//3(n)) 

\p- c - [s- J 

Theorem 5.3 (Main theorem for large markets). Consider a repeated large market mecha¬ 
nism with dynamic population T = {M,T,p), such that T where the stage mechanism M is a 
solution-based {\, p,)-smooth mechanism, the players have gross substitute valuations and the mar¬ 
ket satisfies the large assumption. If all players use an adaptive learning algorithm with constant 
Cr, then the overall expected social welfare is at least: 

Y, i;*)) >-^ 

^ max(l,/x) ^ 

if the probability p of a player leaving is 

m-\iy{NT) 

for C = Q ((polylog(n, m, where N is the number of different strategies each player is using, 
which is at most > when bids on each item are multiples of 5- p. 

There are several mechanisms for this setting that are (A,/i)-smooth. As we showed in Lemma 
14.11 running simultaneous first price auctions for each type of good (as described in Section [2]), 
results in a (| — <5, l)-solution based smooth mechanism. 


A. Proofs of Main Results 


A.l. Proofs from Section [3] 

Proof of Theorem, \S.1\. Let s*’* be the deviation s*{vl,xl) defined by the smoothness property 
and the sequence of these deviations. Let Ki be the number of time steps that s*’* 7 ^ 
and ri(s*’^'^, v^'^) the regret that player i has compared to selecting s*’* at every step, i.e.: 

T 




,S 


= 






))• 


(A.l) 


For shorthand, we denote this with r* in this proof. Observe that since s*’* is uniquely determined 
by and x*, Ki is a random variable that is equal to kfiv]''^,x\'"’"), for each instantiation of the 
sequences and x^'^. 

For any period [Tr,Tr+i) that the strategy s*I is fixed, adaptive learning guarantees that the 
player’s regret for this strategy is bounded by 


Ri{Tr,Tr+l) < C (t^+i - T^) In(A^r), 


(A.2) 







30 


Lykouris, Syrgkanis and Tardos: Learning and Efficiency in Games with Dynamic Population 


Summing over the periods in which the strategy is fixed and using the Cauchy-Schwartz inequal¬ 
ity, we can bound the total regret of each i: 


r* < Cr 




Ki+l 


{K, + 1) {Tr+I - Tr) In(iVT) = C R^ {K, + l)Thi{NT), (A.3) 


Thus for each instance of and v^'^, we have: 


r—l 
4:T 




Adding over all players, and using the smoothness property, we get that 


J]C'(s‘;v‘)<Aj]C'(x‘;C) + /rJ]C7(s‘;C) + J]C'^C(^. + l)2^1n(iVr). 

t t t i 

By Cauchy-Schwartz, y^(A7+T)Tdn(iV7^ < y/n-T ■ ln{NT) ■ + !)• Taking expecta¬ 

tion over the allocation and valuation sequence and using the a-apptroximate optimality and 
Jensen’s inequality: 

E* v‘)] < Aa E* E[Opt(v‘)] + /r E, v‘)] + n • Cr^THNT) {l + ■ 

By the /^-stability of the sequence, we have that Er=i ^i^i] <k-n. By re-arranging we get the 
claimed bound. □ 

Proof of Theorem 13.SI The proof follows along similar lines as the proof of Theorem 13.11 and 
for completeness is given in Section [EC. 11 of the supplementary material. □ 

Proof of Theorem \3.3i . Let Ki and ri(sE be dehned exactly as in the proof 

of Theorem 13.21 including the shorthand of r*. For any period [Tr,Tj.+i) that the strategy s*’* is 
fixed, adaptive learning guarantees that the player’s regret for this strategy is bounded by 

R^{Tr,Tr+l) < CR^J (t^+I - Tr) In(A^r), 


Moreover, if in period r, x- = 0, then by Assumption [T] we have that: Ri{Tr,Tr+i) < 0. Thus, if we 
denote with ^ the indicator of whether in period r, x* = 0, we get: 


R,{Tr,Tr+l) < CR^Xf ,.{Tr+i - Tr) \n{NT), 


Summing over the -|- 1 periods in which the strategy is hxed and using the Cauchy-Schwartz 
inequality, we can bound the total regret of each i: 


Ki + l _ Ki + l Ki + l 

r=l > r—l \ r—l 


Xi,r{rr+l-Tr)ln{NT) 
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Let Y* = Then observe that: 

Ki + l T 

Y,X,Arr+l-Tr) = Y,V- 

Replacing in the previous inequality, summing over all players and using Cauchy-Swartz: 


n 


E 


i=l 


< 


n 

Ki + l 

T 

n K^-\-l 

n T 


Y1 ■A 

r=l \ 

YylHNT) < Cn-. 

EE-^-.-'\ 

i=l r=l \ 

YY^AAnt) 

i=l t=l 


Since each x* is a feasible allocation: YAi=i Hence, YA=i SLi Moreover: 


n /r^ + 1 n iTj n 

i—l r—1 i—1 r—1 i—1 


n Ki 


n 


Y +Y 


n Ki 

< m + EE 

i—1 r—1 


Now observe that for each instance of {v ^'^^ since the latter sum¬ 

mation sums all changes in type or allocation ranging from r = 1 to Ki, such that the allocation 
x[’' in the period right before the r-th change is non-empty. This is at most the set of changes that 
are accounted in It is an inequality as there could be an index r at which both a 

type and an allocation is changing and the summation only accounts it once, while x^'^) 

counts it twice, or there could be changes where x- 7 ^ x-“^ and x- 7 ^ 0 , which are not accounted in 
the above, but are accounted in x^-^). Combining all the above we get: 


E’-: < Ch 


\ 


m + • \/mTln{NT) 


By the no-regret property of each player, for each instance of and v^'^, we have: 


Yu^{s*■,Y > ^u,(s*’*,sE;v* 


— r- 


Adding over all players, and using the smoothness property and the bound on the sum of regrets, 
we get that 


> Ay]LL(x*;v*)-/ry^7^(s‘)-Cfl 


\ 


m + Y^ , x^-'^) • AmT In(A^T) 


i=l 


Taking expectation over the allocation and valuation sequence and using the a-approximate opti¬ 
mality and Jensen’s inequality: 




> ^EE[Opt(v‘)]-G 


R 


m 




x^-'^)]-A'mT\n{NT). 
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By the analogue of ^-stability of the sequence, as defined in Equation (|3.1I1 . we have that 

n 

<k-n. 

i=l 

By re-arranging and using the fact that hE(s*; v*) = +TZ{s*) and that Tl{s*) < hE(s‘; v*) 

(since utilities are non-negative), we get the claimed bound. □ 


A.2. Proofs from Section [4~T] 


Proof of Lemma \4.1\ The proof is similar to the proof of ISvrgkanis and TardosI (|2013l l that the 
mechanism is (1/2, l)-smoothness for continuous bids. Hence, we defer this proof to Appendix lEC.hl 
of the supplementary material. □ 

Proof of Lemma \4-‘<^ The 2(1 -|-e)-approximation result holds as we lose an approximation 
factor of 2 due to the greedy algorithm and another approximation factor of (1 -|- e) due to the 
layers. 

To show the stability let £{v^{j)) be the highest £ such that i{vi{j)) > p{l + eY~^, i.e., the rounded 
version of Vi{j) is p{l -|-which we call the layer of this value. Eor example, any value in 
the range [p,p{l + e)) is in layer 1. Let P{j) denote i{vi{j)) if item j is assigned to player i at time 
t, and let P{j) = 0 if item j is not assigned at time t. We will use the potential function 


3 

to show stability. 

We will show that changes in assignments correspond to increases in the potential function, and 
the potential function can only decrease due to departures. 

When a player who was assigned item j leaves at time t, this immediately decreases the potential 
function by P{j) < log(;^_,_g)(l/p). Next we see how to restore the layered greedy solution after a 
departure and after an arrival. We will claim that each change in the solution corresponds to an 
increase in the potential function. 

To get the desired stability, we will only reassign an item j from a player i to a different player 
i' if i{vii{j)) > £iviij)), that is, if the rounded value is higher. If this is the case, we say that i is 
eligible to be reassigned to item j, and similarly, we will say that player i is eligible to be moved 
from an item j to a different item j' if 

When a new player i arrives, we assign the player to her highest valued item j to which she 
is eligible to be assigned. This increases the potential function by at least one. Now the previous 
owner of the item j has no allocation, and again we assign this player to her highest value item to 
which she is eligible to be reassigned, further increasing the potential function. We continue this 
process till a layered greedy solution is obtained. 
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After a player departs, the remaining solution may have an item j that is unassigned. We reassign 
item j to the eligible player i of highest value. This increases the potential function, but possibly 
leaves a different item, one that i used to have, unassigned. Again we assign this item to the eligible 
player of highest value for the item, further increasing the potential function. We continue this 
process till a layered greedy solution is obtained. 

We have shown that each change in the assignment, other than player departures, increases the 
potential function allowing us to bound the expected number of changes. Each step t, each of 
the up to m players with assigned item leaves with probability p, so the expected decrease in 
the potential function over the T steps of the algorithm is at most pmTp). The poten¬ 
tial function d) is nonnegative, integral, and is bounded by mlog(^^^)(l//o). This implies that the 
expected increase in the potential function during the algorithm is at most m(l +pT) /p). 

Since each change in the solution also increases the potential function by at least 1, the same 
expression also bounds the total number of changes in the allocation and each such change affects 
at most two players. Thus the aggregate number of changes in allocation across players is at most 
2m{l+pT) log(i+,)(l/p). 

Last we also need to account for the departures (or changes in type) of players that are already 
allocated an item. Since there are m such players in each iteration and each is replaced with proba¬ 
bility p, there are mpT such changes in expectation. Thus the total number of changes in allocation 
or changes in type of players that are allocated an item is at most m(2-|-3pT) log(i+j)(l/p). The 
average change for a player is an nth fraction of this, leading to the claimed bound using that 
T > 1 /p. □ 

Proof of Theorem \4-l\ Apply Theorem 13.31 where is the outcome of the greedy-layered 
mechanism; the fact that first price auction is (|,l)-smooth by Lemma 14.11 and that there is a 
stable close to optimal solution by Lemma 1121 to get that: 

— ^E[Opt(v*)] — Cii\jT -m - {5-T -m-p- log(;^+^)(l/p) -|- m) ■ In(A^T) 

Using that pT > 1, we get the first claimed bound. 

To get the multiplicative bound, it suffices to upper bound the expected aggregate regret by 
4 ( 1 ^ Si®'[OPT(v‘)], which is at least j^p^Tmp, by the assumptions e < 1/3 and that each item 
is allocated for a value of p. To show that this is true, what we need to prove is the following (using 
the inequality (I4.2jl i: 


mT • Cjiy'6-pTog(i+,)(l/p) •In(A^T) < 

which is true if 

p^e^ 

^ - 6 • 16(1 + eYiCnY log(i+,) (1/p) In(iVr)' 

□ 
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A.3. Proofs from Section 14.21 

Proof of Lemma \4-3\ For a given value M, if we allocate bandwidth to players up to the point 
when their marginal value for bandwidth is M or more, i.e., setting Xi such that u'(xi) = M 
whenever Xi > 0 and ti'(O) < M when Xi = 0, then the allocations {xi} form the optimal solution 
for total bandwidth 'Pf^Xi. The idea of the proof is to consider this optimal solution to a smaller 
bandwidth, and then round each allocation Xi up to the next multiple of 5. For a value M let 
Xi{M) = 0 if u'(0) < M, and otherwise set Xi{M) > 0 such that u'^{xi{M)) = M. So the optimal 
solution is the allocation Xi{M) for an M such that = 1. 

Now for an allocation Xi to player i let Xi = [a;i/(5](5, the allocation rounded up to a multiple of 
5. Now let seg{M) = Clearly, seg{M) is a monotone decreasing function of M, and is 

right-continuous. Set M be the minimum value such that seg{M) < 1 (Clearly M < maXiu'(O)). 
Now we consider the following segmented allocation: for player i such that Xi{M) < Xi{M), or 
Xi{M) = 0 and u'(0) < M, we set yi = Xi{M). For the remaining players we have that Xi{M) is an 
integer multiple of S and u[{x^{M)) = M. For these players we set yi either Xi{M) or x^{M) + 5 
such that 'E^iyi = 1. We note that such allocation always exists, as seg{M') > 1 for any M' > M, 
so there must be enough players with Xi{M') > x^{M), using y^ = x^{M') = Xi{M) -|- <5 for a subset 
of these players can make the total exactly 1. 

Now we claim that the segmented allocation {yi} satisfies the claim of the lemma. Let Zi = 
yi — Xi{M), be the additional allocation due to rounding, and let z z* denote the total rounding 
used. 

First note that the value of the optimum allocation is at most 'E,iUi{xi{M)) -\- zM. This is true, 
as at the allocation {xi{M)} there is z amount of space left to be allocated, and all players have 
marginal utility at most M for additional space. 

To bound a player’s utility for its allocation y^, we use the fact that the second derivative of the 
utility is at least —a, so we get that 

rVi rVi 1 

Ui{y,)-u,{xi{M))= u’{()d(> {M-af)d^ = Mz,--azl 

J Xi { M ) J Xi ( M ) ^ 

where the inequality used the fact that u[{xi{M)) = M for all players whose allocation was rounded, 
i.e., who have yi > Xi{M). 

To bound the utility of the segmented solution, we add the above bound for all players, and use 
that Zj < (5 for all i to get 


Ui{xi{M)) + Mz — -azh 
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Now by the choice of allocation Xi{M) we have that + Mz > M, and so we can 

bound the last term by 

-azS < -a5 < eM < Ui{xi) + Mz) 

using the bound on 5 and the fact the hrst derivative is at least p. 

Combining these bounds, we get the claimed overall bound 

OPT <'^u,{x,{M)) + Mz< 

i i 

as claimed. □ 

Proof of Lemma \4-4\ The (1 + e)-approximation holds as at most this is lost due to the layers 
whilst the non-layered greedy algorithm would be optimal as the valuation functions are concave. 

In order to prove stability we use the same potential function as in the matching markets: 

j 

We will again show that, unless some player who holds bandwidth departs, changes in the allocation 
correspond to equal increase in the potential function. Hence, we will show that decrease in the 
potential function happens only due to the departures of current holders of bandwidth. 

When a player j who is assigned ruj segments of bandwidth leaves, all her segments become 
free. Hence, this causes a decrease in the potential function of p5). Summing over all 

the players who have items, the expected decrease in the potential function is equal to p ■ 
rrij logj_|_g(l/p5) = I logj_|_g(l//55). This is the same as the expected decrease in the potential function 
in matching markets with a lower bound of p5 instead of p and 1/(5 segments instead of m items. 

When a player i arrives, she either gets assigned to some segments or not. If she does not then 
she does not affect the allocation at all. If she is assigned to some segments, given the tie-breaking 
rule, it means that her marginal value for the segment is higher compared to the player’s who does 
not get assigned to segments due to that. Hence, the potential function increases at least by the 
number of segments she gets and she causes no more changes in the allocation (for each segment 
she takes, she might affect the allocation of at most one player). The total increase is bounded by 
the decrease that is previously done in the potential function hence there is correspondence to the 
matching markets case. 

The remainder of the proof follows exactly the same steps as the proof of Lemma 14.21 having p5 
instead of p as this is the minimum value of one segment (correspondingly item) and 1/(5 as this is 
the number of segments (correspondingly items). □ 
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Proof of Lemma \4-5\ In (jSvrgkanis and TardosI l2013f ) , the proportional mechanism is proven 
(2 — \/3, l)-smooth. The deviating bid used is a bid selected uniformly at random from [0, \vi{x*)] 
where x* is the optimal allocation and A is a carefully tuned parameter. If, instead of the opti¬ 
mal solution, we selected any other solution x*, the same result would hold for solution-based 
smoothness. Letting Bi be i’s realized deviating bid and hi be the bid played, this means that: 






>{2-V3)Wix*)-Y.b, 


Recall that we defined the mechanisms with a discrete action space. Hence, we will consider 
only bids that are m u ltiple s of The deviating bid we will use is the rounding of the bid of 
Svrgkanis and Tardos |2013 1 (to multiples of Q. Hence the deviating bid now will be Bi = • (. 

Summing over all players that hold items in x* 






= E. 




v,{B,,b^i) - B, 


P Es 


^ (vi{Bi,b_i) - Bi - 


> (2 - V3)W{x*)-Y^b,-Y,C = {2-V3-e)W(x*) - ^ b, 

i i j 

The first inequality holds from the monotonicity of the valuation function and the discretization 
of bids. The second holds from the smoothness condition of the non-discretized version. The last 
equality holds replacing ( = eS and by the fact that, for any (i-segmented allocation x, W{x) <1/6 
(as the valuation function is upper bounded by 1 and the number of players that can hold segments 
are upper bounded by 1/(5). □ 

A.4. Proofs from Section 15.21 


Proof of Theorem \5PA To use the jointly differentially private algorithm of iRogers et al.l ((20151) 
with a set of affine latency functions = a^Xe + be, we need to scale them by nmaXe(oe -f 6 e) to 

guarantee that < 1 as required. This makes the functions 7 = 1/n-Lipschitz. We will use the 
jointly differentially private algorithm on the scaled problem, with privacy parameters e(n), 6{n), 
and /3(n) that will depend on the size of the population, and then rescale to the original costs, to 
get a solution with expected cost: @ 


E[C(x; v)] < Opt(v) -|- • m ■ (max(?T,7, m))^^^ • e • log (4m • n • max(?T,7, m) • e//3) • \/ln(l/(5) 

® More precisely, the authors pro ved that they can find a fractional solution with cost at most Opt + R + 4R for R< 
(nm)( 2 ^+ 8 m) _j_ 2 ^. 2 iog(2mr//3).y/sTin(i/s) _ n(n-y^m)e then lose an additional my^2nln(m7i^ to get the 

integral solution. This can give an upper bound of 141 • • m- (maxfny, • log (4m • n ■ max(nj, m) ■ tjP) ■ 
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where 7 = 1/n, and the poly log term is the actual expression in (|5.3p . 

Corollarv IS.ll is expecting an a-approximation algorithm, so we need to bound the approximation 
factor of this algorithm. To claim that it is a (1 + ^)-approximation algorithm we need to guarantee 
that 

- r =— log(4m^ne(n)//3(n))7//n(l/(5(n))) • nmax(oe + be) < -Opt. 

Ve(n) « 2 

2 

A simple lower bound on the optimal solution is Opt > nmineaen/m = ^mineUe, assuming all 
players are congesting at least one elements^ Using this lower bound, and rearranging terms, we 
can guarantee the desired approximation bound by assuming that 

/ 1 41 777 ^/^ , _ 2?77 \ 2 

n> (— - log(4m^ne(n)/^(n))y/ln(l/h(n))) • max(ae + fee) •-:-) (A.5) 

^ Ve(n) e rymmeae/ 

To use this solution as a benchmark in Corollarv l5.ll we need a small enough e(n) and 6{n) as each 
person leaving and arriving causes the benchmark solution to change for an 0{e{n) + /3(u) + h(n)) 
fraction of the population in expectation. We will let S{n),/3{n) = e{n)/3 and set e(n) as small as 
is allowed by Equation ()A.5p . Since e{n)//3{n) = 3 and S{n) = e(n)/3, we need: 


InSh)) - ^log(12m“n) ■ max{<.. + (,.)■ 


2m 


rjmme 


Let /(n) = (141m^/^log(12m^n) • maxe(ae + 6e) * 


2m 


77 mine 


^ =0(m^( l°S(”^^»Wxe(ae + be) y'l ^ 


observe that /(n) = poly(m,log(re)). The latter inequality is satisfied ii 


77 mine Q'e 


e{n) = 


1 

n 


f{n) ln(3n) 


Moreover, by the latter parameters we also have that e(n) + /3(n) + h(n) < |e(n). 

Now applying Corollarv IS.ll to the problem scaled by m • n maxe(ue + 6e) to guarantee the assump¬ 
tion te{x) < 1, that the loss functions for every player are bounded by 1, and scaling back, we get 
that 




(l + l) J]]Opt(v‘) + 

t 



• nT 


2p 


l + -ne{n) 


ln{NT) max(ae -|- be) 

e 


n • m 


^ Consider the cost minimization problem assnming the latency function of all edges is replaced with the latency 
l{x) = X- miueae. The value of the original cost minimization problem is at least the value of this new one. The 
social cost in this new problem is simply: minefle • Since each player congests at least one edge the solution 

must satisfy the constraint: Xe > n. By the convexity and symmetry of the objective function, the latter relaxed 

problem achieves a minimum when all Xe are identical and equal to n/m in which case the value is ^ mine Ue- 

® If we set: e(n) = ^f{n)-\n{3n) then: ln(3/e(n)) = ln(3n) — ln(/(n)) — lnln(3n) < ln(3n). Thus: e(n) = \f{n) ■ ln(3n) > 
i/(n)-ln(3/e(n)). 
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To get the desired bound, we need to make sure that the additive error is bounded by a small 
multiple of Opt. Concretely, we need: 


2 

Using again the ^ mine < Opt(v‘) lower bound for the cost in each step t, we will now show 
that we can gnarantee this with the choice of p suggested in the theorem. With no loss of generality 
we can assnme that e{n)n > 3 (since it holds if m > 2 and n > 2), it suffices to show the following: 

2 _ 5 77 

-Cr ■ nT\/2p ■ 2ne(n) In(A^T) max(ae + be)n ■ m < - ■ — - T ■ — minoe- 

2 e 2 2 m e 


ln{NT) max(ae + hf.)n • ttt, < - • — Opt(v‘) 


-CR-nTi 2p 


1 + -ne{n) 

O 


Finally, we use that the number of player strategies in a congestion game with m elements 
is clearly bounded by N < 2"^, and hence In(A^T) < mlnT. Using this fact, we can rearrange the 
above inequality, and guarantee the required inequality if have 


5 


mine o.e 

■vf- 

(e(n)-n) 

Cr- 

12 

■ maxe(ae + bg) 

777 In T 

5 


mine 


1 

Cr- 

12 

m'^ ■ maxe(ae + bf) 

/(n) ln(3n)7nlnr 


/ 5 (miueae)^ 2^ ^ 

\Cr-12-14:1 (maxe(ae + 6e))^ ^ ^ \og{12m^n)\n{‘in)m}^hiT 

_ f 5 y / (mine Qe) V _1_ 

\Cr-12-1Ai) \maxe(ae + 6e) y \o^{12m‘^n)\n{2)n)m}°\iiT 

The latter completes the proof of the theorem. □ 


A.5. Proofs from Section 15.31 


Proof of Lemma \5. ,51 Algorithm PAlloc with parameter a, finds, w.p. 1 — /3, a feasible solution 
with social welfare at least 


W (x; v) > Opt — a ■ max(ms, 2n) 

w.p. 1 — /3, assnming (15.4h holds. We will nse the pms > p ■ c - lower bonnd on Opt, by 

the two first large market assumptions. Now setting ck = | • ^ with c from the first large market 
assumption, we get that: 


W{x\v) > Opt — amax(ms, n)> Opt 

® The algorithm assumes ms > n and gives an additive error bound of a ■ ms. If ms < n, we run PAlloc with an extra 
m' items such that (m' + m)s = a for some a G [n, n + s]. For all the extra items every player has valuation 0 and, by 
the way the algorithm works, no user gets extra item in the algorithm’s allocation. Applying the algorithm we have 
an error bound of a{m + m')s < a - {n + s) <a - 2n. 
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as required. 

For a given supply s, the bound from Equation (15.4|] required is exactly the one claimed in the 
lemma. □ 

Proof of Theorem \5.3l We apply Lemma [53] with a e(n) = (3{n) that satisfy the condition, i.e., 
set 

e(n) = O ( ——-—-— ) polylog(n, m, s ), 

By Corollary 1 5.1 1 and Lemma 15.51 we have: 

v‘)] > - i) E®^[Opt(V)] - Tn-Cn^2p{l + 2nein))ln{NT) 

In order to lower bound the second term by ^ Opt*, we bound Opt* > pms as before, 

and then it suffices to prove the following: 

Tu-Cr- V2p(l + 2ne(n))ln(iVr) < 3 • | • pms 
Using the assumption that ms > cn^ and rearranging terms this is ensured by: 

CR^2p{l + 2ne{n))ln{NT) < ^pc 

Assuming wlog that n ■ e{n) > 1 and rearranging terms again, we get that this is ensured by 

„< = _^_V 

24(C'fl;)^ In(A^T) ^ \ In(A^T) n • polylog(n, m, s) / 

Using the assumption that ms > cn, this is implied by the condition of the theorem assumed. □ 
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Supplementary Material 


EC.l. Proof of Theorem 13.2 

Theorem 13.21 Consider a repeated mechanism with dynamic population A4 = (M,T,p), such that 
the stage mechanism M is allocation-based {X, p.)-smooth. Suppose that and is a k-stable 
sequence, such that x* is feasible (pointwise) and a-approximately optimal (in-expectation) for each 
t, i.e. a • E[lT(x*; v‘)] >E[Opt(v*)]. If players use an adaptive learning algorithm with constant 
Cr then: 


J]e[1T(s*;v*)] >-J]e[Opt(v*)] - n • Cfl VT • (A: + 1) • In(iVr) 

^ ct m&iX J. ^ ^ j ^ 

Proof of Theorem \3.iA Let s*’^'^be defined exactly as in the proof of Theorem 13.11 and 
be defined similarly as: 


ri s. 


1:T. 


1-T\ 

V = 


E 


Ui 


/ *,t t 

s, 


; V - Ui s ; V 


)) 


For shorthand, we will denote this as r* in this proof. Following exactly the same arguments as in 
the proof of Theorem 13.11 we can show that for each instance of and x^-^: 


r* < CR^y{K, + l)Tln{NT), 


We sum the latter inequality over all players and take expectation over and x^'^. Then we apply 
Cauchy-Schwartz and Jensen inequalities and the /c-stability of the sequence, i.e., 


E 


- 





n 

_ i 

< E 

CR^ViKr + l)Tln{NT) 

i 

<E 


n-r-in(ivr)-y](iL, + i) 

i=l 


< CR,\n-T- ln{NT) ■ J]](E[iL,] + 1) < n • • In(iVT) • (A: + 1) (EC.EC.1.1) 

i^l 

By the definition of regret for each instance of and v^'^, we have: 

T T 


Summing over all players and using the smooth mechanism property, we get that 
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By re-arranging and using the fact that v*) = +Tl{s*): 

Y, v‘) + (^ - 1) 7^(s*) > A (x*; v‘) -Yrl 

t t t i 

Taking expectation over the allocation and valuation sequence and using the a-approximate opti¬ 
mality and Inequality (jEC.EC.l.ip : 

5 ; E|ir (s<; v*)l + (M - 1) E|K(»*)1 > A ^ E[Opt(v*)] - n • CRyjT\n{NT) (fc + 1). 

t t ^ t 

If /r < 1 we get the Theorem, since revenue is non-negative. If /U > I, we will show that total 
revenue is approximately bounded from above by welfare. Specifically, we will show that: 

Ew*‘)]<E E[IT(s*;u*)] + n • CR^T\n{NT) {k+ 1). 

t t 

The latter is equivalent to showing: 

EE E[u,(s‘; u*)] > -n ■ CR^Tln{NT) (k + l). 

t i 

We use the fact that players can always play the empty strategy 0^ of exiting the mechanism 
and receiving zero utility. Thus it suffices to bound the expected average per player regret with 
respect to this empty hxed strategy. Dehne 0^’^ the sequence of fixed empty strategies and denote 
rf = ri(0^’^, v^'^). Then, using the no-regret definition with respect to this empty strategy for 

each player i: 

Yu^{s^■,v^) = -r® 

t 

Hence, for what we want to show, it suffices: 

^E [rf] < n • CR^Tln{NT) (k + l). (EC.EC.I.2) 

i 

Observe that since this strategy and the type of each player i are fixed in the intervals defined 
by the changes accounted for in from the exact same reasoning as what we used to 

bound r*, we can also derive that for each instance of and 

rf < CR.J{K, + l)T\n{NT), 

and thereby similarly as in Inequality (jEC.EC.I.lh we get the desired property given in Equation 
flE0.E0.1.2h . 

Hence, we get that: 

/i J]E[IT(s‘; v‘)] > J]E[IT(s‘; V*)] + (^ - I) J]E[7^(s‘)] - CR^T\n{NT) {k + l) 

t t t 

> - ^E[Opt(v‘)] - nn • CRyjT\n{NT) {k + 1). 

^ t 

Dividing over by fi yields the Theorem. □ 
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EC.2. Stable Sequences via Marginal Privacy 


Here we extend the Corollary 15.11 to use a weaker form of privacy, marginal differential privacy, 
showing that results on marginal differential privacy would have sufficed for our main results from 
Section O This weaker form of privacy may make it easier to prove the existence of approximately 
optimal private solutions. We first state marginal privacy formally and then prove the extension 
of our results. 


Definition EC.2.1 HIKannan et AL.II2014I D. An algorithm A1 : C" —)• G” is (e,(f)-marginally 
differentially private if for every i, for every pair of f-neighbors D, D' G C", every other player j ^ i, 
and for every subset of outputs S' C G for player j. 


Pr[M{D)j G S] < exp(e)Pr[M(D')j G S] + 5 

If 5 = 0, we say that M is e-marginally differentially private. 

Similar to joint privacy, we will allow for our algorithms to have a failure probability /3, with 
which they either return a very inefficient solution or an infeasible solution. 

Theorem EC.2.1. Consider a repeated cost game with dynamic population T = {G,T,p), such 
that the stage game G is allocation-based {X, p)-smooth and T> A Assume that there exists an 
{€,6)-marginal differentially private algorithm A : V” —)• T" with failure probability f that satisfies 
the conditions of Theorem 15.11 If all players use adaptive learning in the repeated game then the 
overall cost of the solution is at most: 

Y, E[G(s‘; v‘)] < 5] Opt(v‘) + ^Gj2p{l + n{e + fi + 5)) \n{NT) 

Proof outline. The proof follows roughly the same outline as the proof of Corollary 15.II (which 
used Lemma 1 5. II and Theorem 13. ip . The outline of the changes needed is as follows. 

1. The notion of marginal privacy is not strong enough to allow the kind of global coupling 
offered by Theorem 15.11 Instead, we can couple the distributions {vl'^ separately for each 
player i, while ensuring that each sequence has expected number of changes in either her solution 
or type at most p-T{l + n(2e + (5)). 

2. With no global coupling of solutions, we cannot directly use Theorem 13.11 Rather we need to 
prove that the stable coupling of distributions of each player’s value and outcome individually is 
strong enough to reach the same conclusion. 

We note that, while we can prove Theorems 13.11 and 13.21 without the need for global coupling. 
Theorem 13.31 requiring Property [H does need the global coupling used there. □ 

We state the claims used by the two steps, and offer a sketch of how to modify the proves used 
so far to prove the claims. 
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Lemma EC. 2.1 (Stable sequences via marginal privacy). Suppose that there exists an algo¬ 
rithm ^ : V" —)• that is (e, 5)-marginal differentially private algorithm, takes as input a valuation 
profile V and outputs a distribution sueh that a sample from this distribution is feasible with proba¬ 
bility I — (3, and is an a-approximately efficient in expeetation (for 0 < e < 1/2, a> 1 and S,/3 > 0). 

Consider the sequenee of valuations v^’^ produeed by the adversary in a repeated eost- 
minimization game with dynamie population T = {G,p,T), and let he the sequence of the 
resulting outcome distributions produced by algorithm A. Then there exists a randomized sequenee 
of solutions xfi^ for eaeh player i, such that for each 1 <t<T, conditional on v* for each i the 
distribution of {vfxf) is the ith marginal distribution of an a-approximation to Opt(v*), and the 
distribution of the sequenees is sueh that the expected number of changes in i’s solution 

or type is at most p • T(l + n(2e + 2/3 + <5)) for each player i. 

Proof of Lemma \EC.2.1[ This is an application of the coupling Leninia [5.2l for each distribution 
cJi, where we use the optimal solution in the low probability event that the marginally differen¬ 
tially private algorithm fails. Using the notation from the proof of Theorem 15.11 marginal privacy 
bounds the effect of a change in valuation of player j 7 ^ / on the distribution < 7 ^. Note that there 
is no requirement that coupling is coordinated between the different coordinates, so the result¬ 
ing distribution of sequences {vl'^,xl-^) cannot be viewed as a distribution of global sequences 

Next we prove the analog of Theorem 13.11 which will finish our proof of Theorem lEC. 2.11 

Theorem EC.2.2 (Improved main theorem for cost-minimization games). Consider 
a repeated east game with dynamie population T = {G,T,p), sueh that the stage game G is 
allocation-based fv)-smooth. Suppose is a sequence of solution distributions, such that the 
solution in has cost at most a times the minimum possible cost Opt(v*) in expectation, and 
suppose the marginal distributions ean be though of as a randomized sequenee of solutions 

^i:T each player i, such that the distribution of the sequenees {vf^jx]'^) has expected number 
of changes in i’s solution or type at most k. If players use adaptive learning algorithms with 
constant G^ then: 

J;E[C(s‘; v‘)] < J;E[Opt(v‘)] + • (A: + 1) • In(iVr) 

t t ^ ^ 

Proof of Theorem \EC.2.A We follow the outline of the proof of Theorem [3T] till equation (1A.4|1 . 
Then take expectation of the resulting inequality to get 

T T 

Y,E{cfis^;^r^)))<Y,ncd<’\sC■y)) + GnW{k + l)THNT). 















Lykouris, Syrgkanis and Tardos: Learning and Efficiency in Games with Dynamic Population 


ec5 


Adding over all players, and using the smoothness property, we get that 


J]E(C(s*; v‘)) < A J]E(C7 (x‘; V*)) + /r V*)) + n- Cfl • {k + l)Tln{NT), 

t t t 

which finishes the proof. □ 

We can prove the analogous theorems for mechanisms as well. 


Theorem EC.2.3 (Improved main theorem for mechanisms). Consider a repeated mecha¬ 
nism with dynamic population A4 = (M,T,p), such that the stage mechanism M is allocation-based 
{X, fi) -smooth. Suppose is a sequence of solution distributions, such that the solution in P has 

social welfare at least an a fraction of the maximum possible value Opt(v‘) in expectation, and 
suppose the marginal distributions can be though of as a randomized sequence of solutions 
^i:T each player i, such that the distribution of the sequences (vf'^jxl'^) has expected number 
of changes in i’s solution or type at most k for each player i. If players use adaptive learning 
algorithms with constant Cr then: 

V E[W(s‘; v‘)] >-^ V E[Opt(v‘)] -u-Cr- JT ■ (k + l)-IniNT). 

^ amax|l,/r|^ 

Theorem EC.2.4. Consider a repeated mechanism with dynamic population T = {M,T,p), such 
that the stage mechanism M is allocation-based {X, fj.)-smooth and T> 1/p. Assume that there 
exists an {e, 5)-marginal differentially private algorithm .A : V” —>• T" with error parameter f that 
satisfies the conditions of Theorem \5.1\ If all players use adaptive learning with constant Cr in the 
repeated mechanism then the overall welfare of the solution is at least 

E[Opt(v*)] — u-Cr - y/2p(l + n(e + /3 + 5)) In(A^T) 


EC.3. Large Congestion Games with General Latencies 

Considering congestion games more generally, iRoeers et al.l (|2015l l assume that the latency func¬ 
tions ie{x) satisfy the following conditions: 

1. The functions ie{x) are non-decreasing, convex and twice differentiable. 

2. Latency on each edge is bounded by 1, that is, £e(n) < 1. 

3. the functions are y-Lipschitz, that is \Ie{x) — Ie{x') \ < ')\x — x'\ for some parameter 0 < 7 < 1. 
Under these assumptions, the algorithm outputs an integer solution that satisfies (e, (5) joint dif¬ 
ferential privacy, and has an error probability of fl for parameters €,5,f> 0 , and for player types 
V with probability 1 —/3 returns a solution x with close to minimum cost: 

C{x-, v) < Opt(u) -|- 20- ^ log {2m^n^'^e{n) / l5{n))J\n (l/(5(n)) 

Ve V 
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Polynomial Latencies. Using this algorithm, we can extend the result for Linear Congestion 
games in Section 15.21 to polynomial latency functions. Consider congestion games with latency 
functions are polynomial of the form 

d 

£e{x) = '^aejX^ 

3=0 

with Qe^d > 0 and Ogj > 0 for all j. More formally: 

Theorem EC. 3.1. Consider a repeated congestion game with dynamic population T = {G,T,p), 
such that r > the stage game G is an atomic (A,^) allocation based smooth congestion game 
with polynomial latency functions £e{x) = ^ejx^ with a^^d > 0 and > 0 for all e and j / d. 
For any r] > 0, if all players use adaptive learning algorithms with constant Gr then the overall 
expected cost is bounded by 


E[C'(s‘; v‘)] < ^(1 + ry) Opt(v*) 


assuming the probability p of departures is at most: C ■rj'^ {d- ^ ■ (ln(T)) ^ for some 

C = e(( - . (C^)-2 . (iog2(6m^n)log(3n)d) ^ 

\ Y m8,Xg [ / 0 >e^d) J 

Proof of Theorem \EC.S.1[ The proof follows the same steps with the proof of Theorem 15.21 
Here we will illustrate just the places where the analysis differs. Similarly to there, let e(n),5(n) 
and /3(n) be the privacy parameters of the algorithm. 

In order to make the latency function on each edge bounded by 1 as required by the algorithm, 
we need to scale the latency of each edge by an upper bound on it. As upper bound, we will use 
n'^(maXeXj=o®e,j)- Recall that for affine latencies, this upper bound was nmaxe(ae + 6e) so here 
we are using its natural extension to polynomials of degree d. 

This scaling down also makes the latencies d/n-Lipschitz, as required by the algorithm: 

4(n)-4(n-l) ^ (n^-(n-l)‘^) • (max^ Xjlp Qej) ^ _ d 

re‘^(maXeXy=o®e.i) (maxe Xj=o j ) 

Similarly with the affine case, to claim that this is a (1 + |)-approximation algorithm, we need to 
guarantee that 



141m^^'^^/nd 


- a 

log {4mfnde{n)/fi{n)^\Jhi (l/(5(n)) • n‘^(max <S0PT 


3=0 


The lower bound we will use for the optimum is: Opt > nmmeae,d{^)'^ = miue Oe^d again 


assuming that each player congests at least one elements, and using the fact that all latency 
functions are degree d. Hence, the desired approximation bound is guaranteed for: 

^^ ^ r, TTTTTTT 2m'^ ''2 


n > 




: I - 

■ log {4mfnde{n)/l3{n))J\a (l/d(n)) 


r/mme a^^d 
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The rest of the proof goes as the proof of Theorem 15.21 replacing e(n) and the upper and lower 
bounds accordingly. □ 

General Congestion Games. We can use algorithm in the p roof of Corollary 15.II for general con¬ 
gestion games satisfying the conditions of (jRogers et al.ll20151 ). and we get the following Theorem. 


Theorem EC.3. 2. Consider a repeated congestion game with dynamic population T = {G,T,p), 
such that the stage game G is allocation based {X, p)-smooth and T > ^. Assume the game satisfies 
the conditions above. For any parameters €,5,j3> 0, if all players use adaptive learning algorithms 
with constant Gr in the repeated game then the overall cost of the solution is at most: 

v‘)] < Opt(v*) + ^0(vp(iT^7T^T^ + 

where the O is a polylog term in T, e, 1/5, l//3,ra,m. 

Proof of Theorem \EG.S.A A small technical difficulty in using the proof of Corollary 15.11 in a 
black box form is that Corollary 15.11 as well as the main Theorem 13.11 used to proye it, are stated 
with multiplicatiye error bounds. Howeyer, using the additiye error in the proof of Theorem 13.11 
we get the following, where v* is the type yector of players, s* is the strategy yector played at time 
f, and X* is the allocation that the differentially priyate algorithm generates. The assumption for 
congestion games was that each indiyidual latency is bounded by 1. Diyiding each latency function 
by m, the number of edges to make the total latency bounded by 1, or equiyalently scaling down 
the error bounds from Corollary 15.II by a factor of m, we get 


J];£[^(5*; v‘)] < A J];E[C’(x‘; v‘)] + v‘)] + nmTGR^2p{l + n{e + fi + 5)) In(iVr) 

t t t 

Adding the bound for the quality of the solution x, and rearranging terms we get the claimed 
bound. □ 


EC.4. Removing the dependence on T 

In our results presented so far, we haye a logarithmic dependence on the total time T the game is 
played. Here we show that with a more careful analysis this dependence is not needed. 

Theorem EC.4.1. Under the assumptions of Theorem \3.R the bound can be replaced by: 

-■kT 


Aa 


E[C'(s‘; v‘)] < E E[Opt(v‘)] 

i p 


l-p 


■ n ■ Gr\ T{k + 1) In ( -In {u/k) + 


2N 


P 


1 - 


for all K G (0,n/e) 
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Proof of Theorem EC.4-1 ■ In the proof of Theorem 13.11 the dependence on the total time T, 
shows up in equation (|A.2h bounding the regret of a player over time. The bound on regret is 
derived from Theorem 12.II of (|Luo and SchaDirell2015[ l where regret over an interval of time [Ti,T 2 ) 
is bounded with T 2 inside the logarithm. In equation ()A.2I) we used the upper bound T 2 <T for all 
the regret terms. 

If all players in our game live at most Tmax steps, we can bound the total regret of the players 
in one position i (using the shorthand r* from the proof of Theorem 13.II) as: 


r* < Cn 




Ki + l 


{K, + l)Yl i^r+i - Tr) ln(A^T_,) = Cii^/{K, + l)Tln{NT^,^) 


r—1 


With a high enough T^axi only a very small fraction of the players will live more than T„iax steps. 
To bound the overall regret without any assumption on how long players can live, we can bound 
the regret of such long living players by 1 in each step. 

Let L- denote the random event that at time t player i has been alive for more than T^ax steps 
for a value of Tmax that we will set later. Let also t correspond to the indicator random variable 
of the event L*. Following the proof of Theorem 13.11 and bounding regret by 1 for each player i at 
any step t that L* occurs, we get the following bound. 


J]e[C(s‘; V*)] < ^ ^E[Opt(v‘)] + ^Cn^T -{k + l)- In(lVT^ax) + 






To prove the theorem, we set T„iax = and we will show that this suffices to get 

< kT, which finishes the proof. 

To bound the expected value of the sum E Li^t] for a given player z, divide the sequence of 
T time steps into intervals Xj of length T^^y^j2. For any interval Xj, let Bij denote the event that 
player i doesn’t change value throughout this interval, and note that the probability of this event 
is bounded by Pr [Bij] = (1 —Now note that, if = 1, i.e. player i has lived more than 
Xmax steps at some time tGlj, there exists a sequence of at most one contiguous intervals ending at 
Xj_i such that player i has not changed value. We will say that player i at time t is associated with 
the first interval in this sequence. Note that, with this process, every player i at some time step t 
with Li^t = 1 is associated to at most one interval Ij where a bad event occurs. Hence, E ^ ^ t 
is at most the expected number of steps t when player i is associated with an interval where a bad 
event occurred. 


To get the claimed bound, we note the following facts: 
• there are n players (indices i) we need to consider, 
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• for each index i we consider 2r/Tniax intervals, 

• the probability that this interval is associated with one particular long living player i is 
bounded by (1 — p)^max/ 2 ^ 

• For every player index i, a bad event in an interval may incur an expected increase in ^i,t] 

of at most the expected lifespan of the user after the interval, i.e. (1 —p) + (1 —p)^ H-< 1/p (as 

every player i has a probability p at each step to turn over). 

Combining these, we get the bound 


E 



< re • ■ 


2T 


(1-p)^ 


x/2 


Substituting Tmax and using that (1 — p)^/^ < 1/e we get the following bound: 


E 


'^L,t 


i,t 


< re • 


2Tp 


21n(re/K) 


•e-^n(”/")-- = re- 


1 ^ •- < kT 

ln(re/re) re 


where the last inequality follows from the assumption that K<nje and hence ln(re/fi:) >1. □ 


Corollary EC.4.1. In Theorem \5.Sl it suffices to bound the probability of departures by 


O 


-p -rre polylog re,m,p, 

maXe(ae + Oe) / V V maXe(ae + 6e) 


Proof of Corollary \EC.4-1\ From Theorem lEC. 4. 11 by setting k = §- 

2 m^Z^xe{a+b ) ’ together with the conditions of Theorem 15.21 we get that the ap] 
antee in the Theorem holds if the probability of departure p is at most: 


O 


miUp Op 


maxe(ae + fee) 


p) • (rre^° log^(?re • re) In(re)) 


In 


2 ln(77/K) 

P 


which essentially is derived by replacing T with bound stated in Theorem 15.21 

To observe that, note that if we do the analysis in the proof of Theorem 15.21 but using p/2 
wherever we used p and replace re as described, the first two terms of the RHS of Theorem lEC. 4. l1 
after rescaling back with rre • re • maxe(ae + fee) can be upper bounded by 


A 

1-p 


1 + ^ 


E 


y~]opT(-\ 


(given that A/(l — p) > 1). Moreover, the last term in the RHS of Theorem lEC.4.1l is also bounded 
by ^E[^j Opt(v*)] , after rescaling, by our choice of k and by the lower bound on the optimum 
of — miup Op. 

m ^ ^ 
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Thus the requirement on the probability p is of the form: 


for 


and 


P < 


A 


ln(B/p) 


A = 0 


mmp Of. 


maXe(ae + bg) 


rj\ • log^(m • n) ln(n)) 


-1 


B = 2 ln(n/fi:) > 1. 


(EC.EC.4.1) 


We argue that p < 210 ^( 23 /a) Inequality IEC.EC.4. II and hence is a sufficient upper bound 

on the probability p. Observe that the function g{p) =plog{B/p) is monotone increasing in the 
region p G [0,B/e]. Wlog in this analysis assume that p < 1/e, hence the latter monotonicity holds 
in this range, since B > 1. Moreover, we might as well assume that 2 iog{ 2 B/A) < since we can 
always assume that A < 1/e. Thus if p < 2 iog( 2 B/A) ’ then: 


p\og{B/p) 


< 


- ®(21og{2B/A)) 21og(2B/2l) 

^(,„g(H^)+l„g,„g(15)) 

A , ^2B\ 

21og(2B/A)^^°^[':i)- 


l^ 2Blog(2B/A) ^ 


Which is exactly inequality (lEC.EC.4.111 . 

Thus we conclude that p < 2 iog( 2 B/A) suffices to get the efficiency guarantee we want. Replacing 
A and B in the latter gives an upper bound of the asymptotic form stated in the corollary and 
which concludes the proof. □ 


Theorem EC.4.2. Under the assumptions of Theorem \3.A. the bound can be replaced by: 

A 




> 


a max{ 1 


—y ^ E[Opt(v*)] — n • T{k + 1) In In (n/«:)^ — bT 


for all K G (0,ra/e), where the term under the square root improves to an T ■ m{k ■ n + 
m) In ^ ^ In (m/ k) ^ under Property [I] 


Proof of Theorem EC.4-3. The proof of the first part of the theorem has the same steps as the 
proof of Theorem lEC. 4. ll hence we omit it. For the second part, the proof is also the same albeit 
invoking the proof of Theorem 1,4.31 to replace n with m. The main difference, for the latter result 
is that, under Property [H it suffices to set Tmax = 21n(m/K), as in the second term we add at most 
ni rp ^ summands. Hence, we can totally remove the dependence on n. □ 
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Corollary EC. 4.2. Theorem [573 continues to hold with an extra r] multiplicative loss in 
the welfare, even under the weaker requirement that the probability of departure is at most: 
O (- , , ^ I, i.e. there is no dependence on T at all in the upper bound. 

Proof of Corollary \EC.4. 2 . Similarly to the Proof of Corollary IEC.4.11 we set ^ = ■ 




mpolylog(n,m,s) 

and B = N\\i{l/'q- p-c). The claim then follows from the previous theorem by setting K = r]-p-c-n 
□ 


Corollary EC. 4.3. Theorem continues to hold with an extra e multiplicative loss in 
the welfare, even under the weaker requirement that the probability of departure is at most: 


O 


i.e. there is no dependence on T at all in the upper bound. 


log(i+j) (l/p)polylog(Af,p,e) ^ 

Proof of Corollary EC. 4 .3 . Again similarly to the Proof of Corollary IEC.4.1[ we set A = 
96 .(i+e)^^og(i — ){i/p) ^ ~ (l/(^P))- The claim then follows from the previous theorem by 

setting K = e - mp □ 


Corollary EC. 4. 4. Theorem 14.21 continues to hold with an extra e multiplicative loss in 
the welfare, even under the weaker requirement that the probability of departure is at most: 

4 4 

96 a^(i-£^)^iog(i^!’^)(a(i-e)/p^£)in(AfT) ’ ^^erc is uo dependence on T at all in the upper bound. 

Proof of Corollary \EC.4.4 Similarly to the proof of Corollarv lEC.4.3[ we set A = and 


B = N\n{l/{ep5) where 5 = . The claim then follows from the previous theorem by setting 


K = ep. 


□ 


EC.5. Smoothness of First Price Auction with Discrete Bid Spaces 

Lemma 14. II The simultaneous first price mechanism where players are restricted to bid on at most 
d items and on each item submit a bid that is a multiple ofS-p, is a solution based (| — 6, l)-smooth 
mechanism, when players have submodular valuations, such that all marginals are either 0 or at 
least p and such that each player wants at most d items, i.e. Vi{S) = maxTcs:|r|=d viT). 


Proof of Lemma \4.1\ Consider a valuation profile v = (vi,... ,Vn) for the n players and a bid 
profile b = ffei..... 5^, L Eac h valua tion Vi is submodular and thereby also falls into the class of XOS 

(j200lh . i.e. it can be expressed as a maximum over additive valuations. 


valuations 


Lehmann et al. 


More formally, for some index set 


‘ jes 

Moreover, by the assumption that marginals are either 0 or at least p, it can be easily shown that 
a[- is either 0 or at least p. Moreover, when the player has value for at most d types of items, it 
can also be shown that for any S at most d of the will be non-zero. 
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Consider a feasible allocation x = {xi,... ,a;„) of the items to the bidders, where Xi is the set of 
types of items allocated to player i (the latter is feasible if each item is never allocated more than 
its supply). Consider the following deviation b*{vi,Xi) that is related to the valuation of player i 


and to allocation Let £*{xi) = argmax^g^. “o- Then on each item j S Xi with > 0, 


submit 


tj 


10 

2 



L J 

6-p 



^ On each j ^ Xi, submit a zero bid. This will submit at most d non-zero bids. 


Now we argue that this deviations imply the solution based smooth property. Let Pj{b) be the 
lowest winning bid on item j, under bid prohle b. Observe that for each j, if Pj{h) < 


J (^i) 


the player wins item j and pays 




. Thus we get: 


S-p 


Ui{b*{v„Xi),b_i;Vi)> ^ 



= (l-Av^ix^)-'^PJ{b) 


S-p 




Summing over all players and observing that TZ{b) > the theorem. □ 


We denote with the closest multiple oi 5 ■ p that is less than or equal to x. 































